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UNIT-I CENTRAL TENDENCIES 


1.1 INTRODUCTION 

In this chapters we introduce several statistical constants which 
quantitatively describe some of the characteristics of a frequency 
distributions. These concepts are also helpful in comparing two similar 
frequency distribution. 


The statistical constants that describe any given group of data are 
chiefly of four type viz. 


(i) Measure of central tendency or measure of location. 
(ii) Measure of dispersion 


(iii) Measure of skewness 


(iv) Measure of kurtosis 


Here we introduce several commonly used measures of central 
tendencies 


Definition. Measure of central tendency are "statistical constants which 
enable us to comprehend in a single effort the significance of the whole". 
Thus a measure of central tendency is a representative of the entire 


distribution. The following are the five measures of central tendencies 
which are in common use 


1. Arithmetic mean(mean). 
2.Median 

3.Mode. 

4.Geometric mean. 
5.Harmonic mean. 


1.2 ARITHMETIC MEAN 


Definition. Arithmetic mean of n observations Xi, X2, . . ., Xn is defined 
by 


- Xp +X Ky Zx 
x — ——————— M 
n n 


Central Tendencies 


NOTES 


self - Instructional Material 


This definition is useful when n is so small that grouping f the 
values into a frequency distribution is not necessary. 


Note: Suppose xi, X2, .. ., Xn be the distinct values of a variate with the 
corresponding frequencies f1,f2,...,fn 


Xfx 
Èf 


Then x= i=1,2,...,n 


This maybe thought of as a weighted average where the weight of x; is 
the corresponding frequency fi. 


Definition. Let xı, X», . . ., Xn be n numbers. Suppose with each x; there 
is associated a weight w;. then the weighted average or weighted means 
of X1, X2, . . ., Xn is defined by y= where i=1,2,...,n 

i 


The usual arithmetic mean x is the special case of the weighted 
arithmetic means where the corresponding frequencies 


Example. Consider the 10 numbers 18, 15, 18, 16, 17,18,15, 19, 17, 17 


m 184+154+18+164+17+184+15+194+17+17 
Then x pe ee ———— À— 


=—= 17 
10 


The frequency distribution for the above data is 


Xi 15 16 17 18 19 


_ (2X 15)+(1 X 16)+(3 X 17) +(3 X 18)+(1 X 19) 


241434341 
-U 217 
10 
Suppose the variates Xj, Xo, . . ., Xio are assigned the weights 
1,3,3,3,2,1,2,2,3,2 then the weighted average 
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Xw= we where i=1,2,...,10 


1 


| 18+45+544+48+344+18+30+38+514+34 
E 22 


nn =16.82 
22 


Definition. The arithmetic mean (A.M) of a grouped frequency 


- Xf 


distribution is defined to be x= t where N=% f; and x; is the mid- 


value of the true class interval. 


Example. For the frequency distribution , the mid-value of the true class 
intervals x; are given by 4.5, 14.5, 24.5, 34.5and 44.5 and the 
corresponding class frequencies are 11, 20, 16, 36 and 17 respectively. 


L 2 fx 
5m 
2 


| (45 X 11) (14.5 X 20) (24.5 X 16) (34.5 X 36) (44.5 X 17) 
= 11+20+16+36+17 


_49.5+290+392+1242+756.5 _ 2730 
fe 100 100 


= 27.30 


Note. The calculation of arithmetic mean maybe considerably simplified 
arithmetically by shifting the origin of reference and at the same time 
altering the scale. 


For example if we take A as the new origin and take h units of the 
: : : Z 
variate xj equal to one unit of the new variate u; then uj= ae 


(i.e) x; = hu; + A Then the A.M. for the variate x; is calculated as follows 


- LIA vr) 
x=——= = 
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-h| —— |-A| —— Central Tendencies 
N N 
BUS NOTES 
nx 2A-hu 


Example. For the frequency distribution , we have the following table for 


— where A=24.5 and h=10 


calculation of x by taking u; = 


Class Mid x; Frequency f; | ui fiui 
-0.5-09.5 04.5 11 -2 -22 
0.95-19.5 14.5 20 -1 -20 
19.5-29.5 24.5 16 0 0 

29.5-39.5 34.5 36 1 36 
39.5-49.5 44.5 17 2 34 
Total - 100 - 28 


Here x=A + hu 
= 24.5410 x()=24.542.8 


= 27.3 


Theorem 1.1. The algebraic sum of the deviation of a set of n values 
from their arithmetic mean is zero. 


Proof. Let xij, X2...Xn be the values with frequencies fi, fo... fh 
respectively. 


- Xf 


X= 


where N=% fi 


The deviation of x; from the A.M. is given by d= x;—x(i= 1,2, Rents n) 
n Ega Eg ux) = Y fx x Y, f, =Nx — Nx=0 


Hence the theorem. 
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Theoreml1.2. The sum of the squares of the deviations of a set of n 
values is minimum when the deviations are taken from their mean. 


Proof: Let xjx5...x, be the set of n values with the corresponding 
frequencies f| fa,.....,fn 


= X fx 


<. X= 


where N=% fi 


Now , the sum of the squares of the deviations of x; from an arbitrary 
number A is given by Z=5; (X-A) 


The value of A for which Z is minimum is determined by the 
d’Z 


JA? 


»0 


one dz 
condition a 0 and 


dz 
Now "rm => -22'¢ (xi-A)=0 


=> )i fix; -N.A=0 


Lu X fixi 
N 
Also dz 225 f, =2N>0 
dA? i 


..Z is minimum when A= x 
Hence the theorem. 
Theorem1.3 If xj, xo...xy are the arithmetic means of nı non;......,ny 


observations then the arithmetic mean of the combined set of 


s NE - yxy tg Xt +N kX 
observations is given by x= JAA nkk 
ni tn24- ny 


Proof. T4Xi is the sum of all the n4 observation in the first set. 


n 2X2 is the sum of all the nz observation in the second set. 


DX is the sum of all the ng observations in the kt” set. 
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il Central Tendencies 


combined set. 


- S ' 
x= p wx) where N=)’ n;. Hence the theorem. NOTES 


isl iz 

Solved Problems. 

Problem 1. The heights of 10 students in c.m's chosen at random are 
given by 164, 159, 162, 168, 165, 170, 168, 171, 154, 169 Calculate 
A.M. 


Solution. Here n=10 
ei 1 
x= ( » X, )=511690) = 169 c.m. 


Problem 2. Calculate A.M. from the following frequency 


Weightin | 50 48 46 44 42 40 
Kgs 
No. of 12 14 16 13 11 09 
persons 


Solution. We have the following table. 


Weight in Kgs No.of persons fixi 

Xi fi 
50 12 600 
48 14 672 
46 16 736 
44 13 572 
42 11 462 
40 09 360 

Total 75 3402 
Ups 

LS aee 

24 
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_ 3402 
^ 75 


=45.36 


Aliter. Choosing A=46 as the origin and h=2 as the scale so that 


u= n we get the following table. 
Xi fi Uj f; ui 
50 12 2 24 
48 14 1 14 
46 16 0 0 
44 I3 -1 -13 
42 11 -2 -22 
40 09 -3 -27 
Total 75 - -24 
n X=A+hbu 
-24 
= 46+ x (7) 
=45.36 


Problem 3. Calculate the A.M for the following frequency distribution of 
the marks obtained by 50 students in a class. 


Marks No. of students Marks No. of 
students 
05-10 5 25-30 5 
10-15 6 30-35 4 
15-20 15 35-40 3 
20-25 10 40-45 2 


Solution. Let us choose A = 22.5 as the origin and h = 5 as the scale, 


i— 22.5 
uj= s: and we get the following table. 
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Class Mid x; uj fi fiu; 
05-10 7.5 -3 5 -15 
10-15 12.5 -2 6 -12 
15-20 17.5 -1 15 -15 
20-25 22.5 0 10 0 
25-30 27.5 1 5 5 
30-35 32.5 2 4 8 
35-40 37.5 3 3 9 
40-45 42.5 4 2 8 
Total - - 50 -12 
X=A+hü 
=225 +522) =22.5-1.2 
= 21.3 


Problem 4. Find the mean mark of students from the following table. 


Marks No. of students 
0 and above 30 
10 and above 26 
20 and above 21 
30 and above 14 
40 and above 10 
50 and above 0 


Solution. We express the above data in the form of a frequency table as 


follows : 
Marks Mid x; No. of students f; Xi 
fi 

00-10 3 4 20 
10-20 15 5 75 
20-30 25 7 175 
30-40 35 4 140 
40-50 45 10 450 

50- - 0 - 
Total - 30 860 
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A em 
N 


Problem 5. Calculate (i) mean price (ii) weighted mean price of the 


Dm 


30 . 


28.67 


following food articles from the table given below. 


Article of food Quantity in Kgs Price per Kg. 
Rice 30 4.50 
Wheat 10 2.75 
Sugar 5.5 6.25 
Oil 3.5 16.50 
Flour 4.5 4.00 
Ghee 1.5 40.00 
Onion 9 3.25 
Solution. 
Article of food Price per Kg in Quantity in Kgs Xi Wi 
Rs. Xi 
Wi 
Rice 4.5 30 135.00 
Wheat 2.75 10 027.50 
Sugar 6.25 5.5 034.38 
Oil 16.50 3.5 057.75 
Flour 4.00 4.5 018.00 
Ghee 40.00 1.5 060.00 
Onion 3.25 9 029.25 
Total 77.25 64.0 361.88 
X 
Mean Price = ace Rie = Rs.11.0 
; : 2088 361.88 
weighted mean price = ——— = = Rs.5.65 


64 


Problem 6. The four parts of a distribution are as follows. 


Frequency Mean 
Part 1 50 61 
Part 2 100 70 
Part 3 120 80 
Part 4 30 83 


Find the mean of the entire distribution. 
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Solution. n, = 50; n; = 100; n, = 120;n, = 30 
Central Tendencies 


x1=61; x, = 70; x4 = 80; x, = 83 


— 14X14 tn2X5tna3Xa4tn4X4 


Now, x= 
Ny *n2n3-n4 NOTES 


_ (50 X 61)+(100 X 70)+(120 X 80)+(30 X 83) 
~ 50+100+120+30 


|. 22140 
^ 300 


= 73.8 
.. Mean of the entire distribution is73.8 


Problem 7. Mean weight of 80 students in two classes A and B is 50 
kgs. There are 45 students in class A. The mean weight of the students in 
class B is 48. Find the mean weight of the students in class A. 


Solutions. Here n, = 45; n; = 35; x1250; X2=48 


. = — 1y4X14+N2X 
We need to find x; from the formula LP E 
1 2 


| 45x1+35 x 48 
d 80 


2190 


45x4-(80 x 50)-(35 x 48)=4000-1680 
= 2320 
7X4 751.56 kgs. 

.. Mean weight of the students in class A is 51.56 kgs. 

Problem 8. The average weight for a group of 40 students was 
calculated to be 58 kgs. It was later discovered that weight of one student 
was misread as 75 kgs instead of the correct weight 57 kgs. Find the 
correct average. 


Solutions. Total weight of 40 students = 40 x 58 = 2320 


Total weight after correction = 2320-75+57=2302 


.. After correction, the average = e — 57.55 
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Problem 9. Show that (1) the A.M. of the first n natural numbers is 
=(n +1). (ii) the weighted A.M. of first n natural numbers whose 


weights are equal to the corresponding numbers is equal to ~(2n+1) 


Solutions. (i) A.M. of the first n natural numbers = >x 
_14+2+en 
7 n 
_ n(n*1) 
T zn 
=-(n+1) 
(ii) The required Weighted AM = P 
E 12*224...... my, 
T 1424en 
_ naceDQnep/s 
n(n+1)/6 
1 
x (2n + 1) 


Problem 10. The frequencies of values 0,1,2,....,n of a variable are 


: ' E 
given respectively by ln, n, ,......,nc,. Show that the mean is j;n 


i-l 


Y fx = 0 + (na) + 2(n,, )23(n,, )+...-+n(ie, ) 


- 2|» +3 [eee TS +n 


=nft+(n—1) [E22] a] 
= n[1 4 (n— 1), + (n—- 24, +--+ (- 0, 4] 
-n(1- 1)" 


zgp21 


= Zfixi ot n2r-1 1 
=F; 2n 


XI 
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Exercises. 
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1. Calculate the A.M. from the following data. 


Value 1 2 3 4 5 6 7 8 9 


Frequency | 7 11 16 17 26 31 11 1 1 NOTES 


2. Find the mean mark of the following frequency distribution 


Marks 0-10 10-20 20-30 30-40 40-50 
No. of 3 5 9 3 2 
students 


3. Find the mean mark of students from the following table 


Marks No. of students Marks No. of 

students 
0 and above 80 60 and above 28 
10 and above TI 70 and above 16 
20 and above 72 80 and above 10 
30 and above 65 90 and above 8 
40 and above 55 100 and above 0 
50 and above 43 


4.Find the mean for the following data 


(i) 


Class Frequency Class Frequency 
0-9 32 50-59 167 
10-19 65 60-69 98 
20-29 100 70-79 46 
30-39 184 80-89 20 
40-49 228 90-99 0 
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i) 


Temperature No. of days Temperature No. of days 
centigrades centigrades 
-40 to-30 10 0-10 65 
-30 to-20 28 10-20 180 
-20 to-10 30 20-30 10 
-10to0 42 Total 365 


1.3 PARTITION VALUES (MEDIAN, QUARTILES, 
DECILES AND PERCENTILES) 


Partial values are those values of the variate which divide the 
total frequency into a number of equal parts. Some important partition 
values are median quartiles, deciles and percentiles. 


Median 


Median of a frequency distribution is the value of the variate 
which divides the total frequency into equal parts. In other words median 


is the value of the variate for which the cumulative frequency is EN 
where N is the total frequency. 

In the case of ungrouped data if n values of the variate are 
arranged in ascending or descending order of magnitude the median is 
the middle value if n is odd and it is taken as the arithmetic mean of the 


middle values if n is even. 


Example. Consider the  values54,81,84,71,61,57,58,54,56,67,49 
Arranging these values in ascending order of magnitude we get 


49,54,54,56,57,61,67,68,71,81,84 
Since there are 11 items 6" item , namely 61, is the median. 


Note. In the case of the discrete frequency distribution we calculate the 
median as follows1. Calculate EN = D fi 


2. Find the cumulative frequency just greater thenz N 


3. The corresponding value of the variate is the median. 
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Example. Consider the following discrete frequency distribution. 


X f Less than c.f 

1 5 

2 14 

3 18 32 

4 12 44 

5 53 

6 7 60 
Total 60 - 


Here N=60 Hencez N=30 
The values of x for which the c.f is just greater than 30 is given by x=3 


..x=3 is the median of the frequency distribution. 


We now derive a formula for calculating the median in the case 
of a grouped frequency distribution. 


Definition. For a grouped frequency distribution the median class is 
defined to be the class where the less than cumulative frequency is just 


1 
greater than z N 


Quartiles. 
Definition. Consider a frequency distribution with the total frequency N. 
The value of the variate for which the cumulative frequency is N/4 is 


called the first quartile or lower quartile and it is denoted by Q4, 


Similarly, the value of variate for which the cumulative frequency is 
3N/4 is called the third quartile or upper quartile and it is denoted by Q5. 


Clearly, median is the second quartile and it can also we denoted by Q; 


In the case of ungrouped data with n items Q4 is calculated as follows. 
STE : 1 
Let ie (n 1) = the integral part of 7 (n+ 1) 
Let q- (n 4 1) - |; (n+ 1) . Hence q is the fractional part. 


Then Q, = x; + q(xi.4 — xj). Similarly Q5 = xi + q(xi«4 — xi) 
14 
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where ieE (n 1) and q= (n + 1)- F (n + 1) 


In this case of grouped frequency distribution the quartiles are 
calculated by using the formula 


Q, = L4 Cem" and Q = 14 Cot 
fk fk 


where | is the lower limit of the class in which the particular quartile lies, 
fx is the frequency of this class, h is the width of the class and m is the 
cumulative frequency of the preceeding class. 


Deciles. 


Consider a frequency distribution with total frequency N. The 
2 ; : : iN ,. 
value of the variates for which the cumulative frequencies are m (i = 


1,2,...,9 are called deciles. The /ZZ decile is denoted by JžClearly 
median is the fifth decile. Hence the median can also be denoted by Ds 


In the case of the ungrouped data with n items, for k=1,2,3.......,9 
k(n+1) k(n+1) E | 
10 10 10 


Dy = xj + q(xi44 — xj) where i= | l and q= 


As before for a grouped frequency distribution we can prove that 


Percentiles 


Percentiles are the values of variates for which the cumulative 


Clearly median is 50°" percentile and hence median can alsobe denoted 
by P50 


In this case of ungrouped data with n items, for k=12,....,99 


k(n+1) = E 


: k(n+1) 
Py = Xi + q(Xi41 — xi) where i = | 100 | and q= 100 100 


Percentile are got from the following formulae in the case of grouped 


(Nu, )h 
frequency distribution P; = | + -—499—m^—. j=1,2.......,99 
q y fi 
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Solved Problems 


Problems 1. Find the median and quartiles of the heights in c.m. of 
eleven students given by 66,65,64,70,61,60,56,63,60,67,62 


Solution. Arranging the given data in ascending order of magnitude we 
get 56, 60, 60, 61, 62, 63, 64, 65, 66, 67, 70 


Here nz 11. Since n is odd, median is the sixth item which is equal to 63 


Q, =size of =(n + 1)” item. 


^. Q4 = third item = 60 


Q = =(n +1)" item = 9" item = 66 


Problem 2. Find the median and quartile marks of 10 students in 
Statistics test whose marks are given as 40,90,61,68,72,43,50,84,75,33 


Solution. Arranging in ascending order of magnitude we get 
33,40,43,50,61,68,72,75,84,90 Here n = 10 


Hence median is the average of the two middle items viz 61 and 68 


-. Median = =(61+68) = 64.5 marks. 


First quartile. 


Here | (n+ 1) — 2andq =-(n+ 1) - (n+ 1) NS 


n Q12 Xp + .75 (x4 — x2) 240 +.75 (43-40)242.5 


Third quartile. F (n+ 1) =8 and q== (n + 1)— F (n+ 1) =.25 


-Q3 = Xg 4.25 (Xo — xg) = 75 + .25 (84-75)=77.25 


Problem 3. From the following data calculate the percentage of tenants 
paying monthly rent (1) more than 105 (ii)between 130 and 190 


Monthly rent No. of tenants | Monthly rent No. of tenants 
60-80 18 140-160 88 
80-100 21 160-180 75 
100-120 45 180-200 18 
120-140 85 Total 350 
16 
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Solution. (1) Number of tenants paying more than Rs.105 is 


(2s 


) x 4548548875418 
= 344266 = 300(approximately) 
: 300 
-. Required percentage = "ERES 100 
=85.7 (approximately) 


(ii) No. of tenants paying the rent between Rs.130 and Rs.190 


E ea 
~ 20 


) x 85488754 (257) x 18 
20 
= 42.5 + 88 +75 + 9 2215 (approximately) 
: 215 
<. Required percentage = en 100 


= 61.43 


Problem 4. An incomplete distribution is given below. 


Class Frequency Class frequency 
0-10 10 40-50 ? 
10-20 20 50-60 25 
20-30 ? 60-70 15 
30-40 40 Total 170 


The median is 35. Find the missing frequencies. 


Solution. Let the frequency corresponding to the class 20-30 be f, and 
that of class 40-50 be fz 


^L fi +f2 =170 - (10+204+40+25+15) 

^. fi +f 260 

Now, the median 35 lies in the median class 30-40 
^. Lz30 ; M=10+20 + fı ; fx- 40 and h=10 


We have median = 1 (eon) x 10 
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7.352 30+4(55-f,) 


At = 35 


Using (2) in(1) we get f; = 25 


fa = 35 and f; = 25 are the missing frequencies. 


Exercises 


1. Obtain the median for the following frequency distribution. 


x: | 1 2 


3 4 


5 


6 


7 8 


9 


f: | 8 10 


11 16 


20 


25 


15 9 


2.Find the median lower and upper quartiles, 8t” decile, and 56t” 


percentile for the following distribution of245 workers. 


Monthly No. of workers | Monthly wages No. of 
wages workers 
1-2.99 6 9-10.99 21 
3-4.99 53 11-12.99 16 
5-6.99 85 13-14.99 4 
7-8.99 56 15-16.99 4 


3.Calculate the median of the following series. 


Wages in Rs. No of workers 
More than 100 5 

More than 90 17 

More than 80 37 

More than 70 43 

More than 60 49 

More than 50 49 

More than 40 51 
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4. Find the three quartiles, for the following distribution. 


Marks No of Students Marks No. of 
students 
5-10 5 25-30 5 
10-15 6 30-35 4 
15-20 15 35-40 2 
20-25 10 40-45 2 
1.4 Mode 


In a distribution the value of the variate which occurs most 
frequently and around which the other values of variates cluster densely 
is called the mode or modal value of the distribution. 


In the case of a discrete frequency distribution mode is the value 
of the variate corresponding to the maximum frequency. 


Example. Consider the discrete frequency distribution 


x: | 2 3 4 5 6 7 8 9 10 


f: | 8 13 47 |105 | 28 9 5 3 2 1 


Here the maximum frequency is 105 


The value corresponding to this maximum frequency is 4. Hence 
mode is 4 


In the case of grouped frequency distribution the mode is 


computed by the formula Mode= l + TS. D 
7J1^J2 


Where | is the lower boundary of the modal class (class having maximum 
frequency); f is the maximum frequency ; fı and f» are the frequencies of 
the classes proceeding and following the modal class ; h is the width of 
the class. 


An alternate formula for finding the mode is also given by 


Mode =1 + hfe with the above notations. 
fit f2 
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Note 1. In the case of irregularities in the distribution or when the 
maximum frequencies are repeated or the maximum frequency occurs in 
the very beginning or at the end, the modal class is determined by the 
method of grouping and then the mode is got by using any one of the 
formulae. 


Note 2. A frequency distribution may have more than one mode in 
which it is called multimodal distribution. If there is only one mode it is 
called unimodal distribution. 


Note 3. There is interesting empirical relationship between mean, 
median, mode which appears to hold for unimodal curves of moderate 
asymetry namely. 


Mean - Mode = 3(Mean - Median) (i.e) Mode = 3 Median - 2 Mean. 
Solved problems. 


Problem 1. The following are the heights in c.m. of 10 students. 
Calculate the modal height 63,65,66,65,64,65,65,61,67,68 


Solution. Since 65 occurs 4 times and no other item occurs 4 or more 
than four times 65 c.m. is the modal height. 


Problem 2. Calculate the modal for the frequency distribution given in 
solved problem 3 in 2.2 


Solution. Here maximum frequency 52 occurs in the class 30.5 - 35.5 
(refer table in page 36) which is the modal class. 


4123055; fi zZ47;f; 241 and h=5 


-. Mode 214-2. 239,5 2%“ = 30,5 + 205 
f1* f2 47-41 88 


= 32.83 


Problem 3. Given that the mode of the following frequency distribution 
of 70 students is 58.75 Find the missing frequencies f,and f£; 


Class frequency 
52-55 15 
55-58 f 
58-61 25 
61-64 £ 
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Solution. 


Class f 
52-55 15 
55-58 f 
58-61 25 
61-64 £ 
Since N = 70 we get fj + 5230 oe (1) 
= h(f- fi) 
Mode = 1 E UT " 
: E: 3x(25- f1) 
258.75 = 58+ EFT 
l || 3x(25- f4) 
AUS TETA 
2 3fi-f2=50 (verify) — 0 222 (2) 


From (1) and (2) we get f? = 20 and f; = 10 
Exercises. 
1.Find the mean, median and mode for the set of numbers. 
(i) 6, 8,,2, 5, 9, 5, 6, 5, 2,3 
(i1) 61.7, 71.8, 65.3, 70, 69.8 


2. In a moderately asymmetrical distribution if mean is 24.6 and 
mode is26.1 find the median. 


3. In a skewed distribution mean and median are respectively 33 
and 34.5. Find the mode. 


21 


Central Tendencies 


NOTES 


self - Instructional Material 


4. Find the mean, median and mode of the following frequency 


distribution. 

Class Frequency Class frequency 
20-24 3 40-44 12 
25-29 5 45-59 6 
30-34 10 50-54 3 
35-39 20 55-59 1 
5. Calculate the mode from the data given below. 

Wagesin | Number of Wages in Rs. Number of 
Rs. workers workers 

Above 30 520 Above 70 104 

Above 40 470 Above 80 45 

Above 50 399 Above 90 7 

Above 60 210 Above 100 0 


Central Tendencies 


NOTES 


6. Calculate the mode from the following frequency distribution. 


X|1-9 | 9-17 17-25 25-33 33-41 | 41-49 | 49-57 


f | 20 31 27 15 10 7 8 
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UNIT-II GEOMETRIC MEAN AND 
HARMONIC MEAN 


2.1 GEOMETRIC MEAN 


Definition. The geometric mean (G.M.) of a set of n observations 
X1, X2, e an , X, is the n^ root of their product. Thus, geometric mean is 


.. log G= 2 (logx, + logx; +......+l0gXn) 


ee 


-. G- antilog [22] 


n 


In case of a grouped frequency distribution geometric mean 


T 
G-(xf* xP........ x») / where N=); fi 
As before we can write G= anti log |: ( fi lo 9| 


2.2 HARMONIC MEAN 


Harmonic mean (H.M) of the set of n observations x, X2, ... ... ,Xn 18 
defined to be the reciprocal of the arithmetic mean of the reciprocal of 
the observations 


Thus harmonic mean H= Eus, 


In the case of a grouped frequency distribution harmonic mean 


1 


H-4———— where N=} f; 
Tye yap) "here N22 fi 
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Solved Problems. RENIE mean and 
Harmonic mean 


Problem 1. Find the G.M. and H.M. of the four numbers 2, 4, 6, 27 
Solution. G.M. =G = (2 x 4 x 6 x 27) /4 
-[2 x(2x2) x 2x 3) x (3 x 3x 3)]'/^ NOTES 


eot x qe 


1 4 x108 
MMSE eee oig ede 


Problem 2. Find the G.M and H.M of the following distribution. 


X 1 2 3 4 5 
f: 2 4 3 2 


Solution. GM. 2G-2 (1? x 24 x 33 x 4? x 5)? 


- (16 x 27 x 80) "? 
s 1 
= antilog |= (log 34560) | 


= antilog (.3782) =2.384 
12 
HM=H= = 
2(7) * (3) + 36G) + 22) +1) 
12 
oua + + : 
= 2.11 


Problem 3. Find the G.M for the following frequency distribution. 


Marks 0-10 10-20 20-30 30-40 
No. of 5 8 3 4 
students 
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Solution. Harmonic mean 
Mid Xi Frequency fi logio Xi fil0g10 Xi 
5 5 0.6990 3.4950 
15 8 1.1761 9.4088 
25 3 1.3979 4.1937 
35 = 1.5441 6.1764 TE 
Total : 23.2739 NOTES 


= meme (df; log x;)| 


=antilog |> (23.2739) | 
=antilog (1.1637) 


= 14.38 


Problem 4. Find the H.M. for the following frequency distribution. 


Class 0-10 10-20 20-30 30-40 40-50 


Frequency 15 10 7 5 3 
Solution. 
Mid x; fi 1/x; fi/Xi 

5 15 .2000 3.0000 

15 10 .0667 0.6670 

25 7 .0400 0.2800 

35 5 .0286 0.1430 

45 3 .0222 0.0666 

Total 40 - 4.1566 

1 40 


Harmonic mean = H SO/N SG: / x) 41566 


Problem 5 Calculate the average speed of a train running at the rate 
of 20 k.m per hour during the first 100 k.m., at 25 k.m.p.h. during the 
second 100 k.m and at 30 k.m.p.h. during the third 100 k.m. 


Solution. Clearly weighted H.M. is the proper average 


Xwi 


Weighted H.M = Y wi fx) 


100+100+100 
= mm, mo, m = = 24.32 k.m.p.h(verify) 


20 T 25 30 
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Exercises 


1. Calculate A.M., G.M. and H.M. of the following observations 
and show that A.M » G.M » H.M. 32,35,36,37,39,41,43 


2. Calculate the H.M of the following series of monthly 
expenditure in Rs.of a batch of students 125,130,75,10,45,0.5,0.4, 500,15 


3. Find the G.M for the distribution. 


Class Frequency Class Frequency 
0-9 32 50-59 167 
10-19 65 60-69 98 
20-29 100 70-79 46 
30-39 184 80-89 20 
40-49 288 Total 1000 
4. Find the H.M of the following distribution. 
X 2 3 4 5 6 
f 5 7 11 9 8 


5. Calculate the G.M and H.M of the following frequency distribution. 


Class 


2-4 


4-6 


6-8 


8-12 


frequency 


20 


40 


30 


10 
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UNIT-III MEASURES OF DISPERSION 


3.1 INTRODUCTION 


You have learnt various measures of central tendency. Measures of 
central tendency help us to represent the entire mass of the data by a 
single value. Can the central tendency describe the data fully and 
adequately? In order to understand it, let us consider an example. 

The daily income of the workers in two factories are : 
Factory A: 35 45 50 65 70 90 100 


Factory B: 60 65 65 65 65 65 70 


Here we observe that in both the groups the mean of the data is the 
same, namely, 65 


(i) | In group A, the observations are much more scattered from the 
mean. 

(i) In group B, almost all the observations are concentrated around the 
mean. Certainly, the two groups differ even though they have the same 
mean.Thus, there arises a need to differentiate between the groups. We 
need some other measures which concern with the measure of 
scatteredness (or spread). 


To do this, we study what is known as measures of dispersion. 
3.2 MEASURES OF DISPERSION 


Definition. Dispersion of a distribution is the amount of scatteredness 
of the individual values from a measure of central tendency. There are 
four measures of dispersion which are in common use. They are as 
follows 


(i) Range (ii) Quartile (iii) Mean deviation (iv) Standard deviation. 


Range 
It is the simplest method of studying dispersion. Range is 
the difference between the smallest value and the largest value of a 
series. While computing range, we do not take into account 
frequencies of different groups. 
Example. The maximum value is 49 and the minimum value is 1. Hence 
the range is 48 


Quartile Deviations (Q.D.) (Semi inter quartile range) 


The quartile deviation (Q.D) or semi inter quartile range is 
defined by Q.D-7 (Q3 — Q1) where 

Qı and Qaare the first and the thord quartiles of the 
distribution. 
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Example. Q4 =17 and Q; = 38.5 


Hence Q.D = (38.5 — 17) = 10.75 


Mean Deviation. The mean deviation of a frequency distribution 
f i : _ Xfixi-Al S 
rom any average A is defined by M.D.= TA an where N=} fi 


Example.For the data, x — 27.3(refer example under A.M. in section 


1.2) Now. 

Mid x; fi Ixy — 27.3 | | fi1lx4— 27.3 | 
4.5 11 22.8 250.8 
14.5 20 12.8 256.0 
24.5 16 02.8 044.8 
34.5 36 07.2 259.2 
11.5 17 17.2 292.4 
Total 100 - 1103.2 

-. M.D about mean UT — 11.032 


Standard Deviation. 


A common measure of dispersion which is 


preferred in most circumstances in statistics is the standard deviation. 


Definition. The Standard deviation o of a frequency distribution 


(y. 7211/2 
Lf M where N=) f; and x is the arithmetic 


is defined by o=| 
mean of the frequency distribution. 

The square of the standard deviation of a frequency 
distribution is called the variance of the frequency distribution. 


Hence variance 2o? 


Note. If o?, is the variance of a sample of size n the “best” 
estimate for the population variance o? is not o?, 


But (i 


eu o^, For this reason many authors define standard 


"TZ 
deviation by the formula o= [RG i 
For large values of N the two formulae for standard 
deviation are practically indistinguishable. Throughout this book we 
use the first formula for finding standard deviation of a frequency 
distribution. Both the formulae for standard deviation find place in 
modern calculators. 


Definition. The root mean square deviation of a frequency 
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Measures of Dispersion 
(y, 4y2qi/2 
distribution is defined to be s=[>““=*"] 
where A is any arbitrary origin and s? is called the mean square 
deviation. 
NOTES 


Definition. Coefficient of variation of a frequency distribution is 
defined to be C.V == x 100 


For comparing the variability of two sets of 
observations of a frequency distribution we calculate the C.V for 
each of the set of frequency distribution. The set having smaller C.V 
is said to be more consistent than the other. 

Examplel1 . Consider the numbers 1, 2, 3, 4, 5, 5, 7 
Their arithmetic mean x=4 
Now, Y (xj. 4)?228.(verify) 
xi 2211/7. rag V2 
E eae 
Example2. For the frequency distribution , x 227.3 
Hence we have the following table. 


Xj fi Xi (Xi-27.3)7 fix; 273)? 
— 27.3 
04.5 11 -22.8 519.84 5718.24 
14.5 20 -12.8 163.84 3276.80 
24.5 16 -2.8 7.84 125.44 
34.5 36 7.2 51.84 1866.24 
44.5 17 17.2 295.84 5029.28 
Tot 10 - - 16016 
al 0 
-~2_16016 
‘© = 
=> fi (x,;_X)*= = 160.16 


“.0=12.66 


We now establish a relation between the root mean square deviation 


s and standard deviation o 


Theorem3.1 c? = s? 


Proof. s 


2— 


È f i(xi-A)? 


N 


_ XEfixi-X*X-AY? 


= Df iG; — XY + 2f G- 32-4) 


N 
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È fi (x — A)*] 


n MEV 
Pi T^Y (x — x)d? 


= o? + d? (sinceY f; (x; — x) = 0) 
io = s? — d? 
Corollary. The standard deviation is the least possible root mean 
square deviation. 
Proof. We have s? = o? + d? 
-.s? is least when d = 0. Hence the least value of 


s? and oĉ. 


The following theorem gives another formula for 
calculation of standard deviation of a frequency distribution. 


5 1/2 
Theorem3.2 o = lean — c] 
Proof. o?-(1/N) X f; (x-5)? 

-(/N)[X fiGa? — 2xix + x*)] 


Theorem 3.3 The standard deviation o is independent of change of 
origin and is dependent on change of scale. 


1/2 


Proof. We have o2 = (1/N) X(xj x)? 


Suppose we change the variable x; and u; where u; = x; — A, 


A being an arbitrary origin. 


We know that u=x—A 
Now, uj — U =x; — A 
Now, 
ox = (1/N) X fi(xi-2)° = (1/N) X filu; -u)? 
=o 


Hence o is independent of change of origin. 
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Now, suppose we change the variable 

x; and v;_ where v, = x;/h. 

Thenv = x/h 

v; — V = (1/h) Gx;- X) 

Now, oz-(1/N) X fiG;-X)* = (h? /N) X fi(vi-7)? 
ahaa 
..S.D is dependent on change of scale. 

Note. When we effect a change in origin as well as in scale o? is 
multiplied by the square of the scale introduced. 


_ Q2[Xfiw? — (Xfiu 2 
Hence og = h | (m ) | 


Theorem 3.4 (Variance of combined set) Let the mean and 
standard deviation of two sets containing n4 and n; 

Members be X,, X; and o,,0; respectively.Suppose the two sets 
are grouped together as one set of (n4 + n?) members. Let x be the 
mean and o be the standard deviation of this set. Then 


TS 5 Ini Coi + dj) + n;(o$ + d2)] 


oO 
nyt 


Where d, = x; — X and d; = xz — x 


1 
ny+n2 


[Xi 7 fiGu-3? ] 


Proof. o? = 


-—— [xiu fies? + DE, filer -2)?| 


nı +n2 


Now, Y fiGQa-x» x à 1 iG — Xy tXi— x) co un Xi) + 


d12 
= Pica fiu — %1)+2dD i fi i — X) + 
di Yi 
-n4,02 + nd? (since Y f; (xj — X1) = 0) 
Similarly Yin. 21 fiGx; X)? 2 n;62 + nd 
Hence o? = = [(n, 07 + n4d2) + (n302 + n4d2)] ........ (1) 


= fans ue + df) + n2 (03 + d2)] 


nyt 
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Solved Problems; Measures of Dispersion 


Problem 1. Find (i) mean (ii) range (iii) S.D (iv) mean deviation about 
mean and (v) coefficient of variation for the following marks of 10 
students. 


20, 22, 27, 30, 40, 48, 45, 32, 31, 35 
NOTES 


Solution. (i) Mean = 1/n ¥ x; = m = 33 


(ii) Range = Maximum value - Minimum Value 


=48-20=28 


a « i-e] 


Here we have Xx? = 11652(verify) 


1/2 


11652 (330 2]! 7 1/2 
s =|=- (=) | =(76.2)¥? = 8.73 


10 10 


(iv) Mean deviation about mean =— [X | X; — 33] 


=> [13 +11+6+3+7+15+12+1+2+2] 


=7.2 


8.73 
33 


(v) C.V = (2) x 100 — ( )x100 = 26.45 


Problem 2. Show that the variance of the first n natural numbers is 
iQ -1) 


low ae Say PL 

Solution. o^- = (=) 
We have Xx; = 1--2- +n — Zn(n + 1)and 

Lxi? = 1? +2? + +n? 2 nC  1)2n 4 1). 


2 
og Sea, pee 


6n 2n 


==(n+1)(2n+ 1) -2(n+ 1)? 
=— [2(n + 1)(2n + 1) - 3(n + 1] 


=—[(n+ 1)(4n + 2 — 3n - 3)] 
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Problem 3. The following table gives the monthly wages of workers in a 
factory. Compute (1) standard deviation (ii) quartile deviation and (iii) 


= Z[Q- 2 - 1) 


coefficient of variation. 


E m ME 
a (n 1) 


Monthly No. of Monthly wages | No. of workers 
wages workers 

125-175 2 375-425 4 
175-225 22 425-475 6 

225-275 19 475-525 1 

275-325 14 525-575 1 

325-375 3 Total 72 


Solution. Let A=300; h=50 and u; = = (x; — 300). The table is 


Mid x; f; Uj fiui fiu;? c.f 
150 2 -3 -6 18 2 
200 22 -2 -44 88 24 
250 19 -1 -19 19 43 
300 14 0 0 0 57 
350 3 1 3 3 60 
400 4 2 8 16 64 
450 6 3 18 54 70 
500 1 4 4 16 71 
550 1 5 5 25 712 

Total 72 - -31 239 - 
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(i) X - A 4 hü 
= 300+50( ==) = 390 1999 1300015327847 
72 72 
Gi) Q4 = 175 + ee 
NOTES 
800 


= 175 + — = 211.36 
22 


I (54—43)x50 
Q4 = 275+ EXT NE 


= 2754 eat 314.29 
i 14 . 


~. QD =5(Q3 - Q1) 


= —(314.29 — 211.36) 


= 51.45 


idis 2:5 2 X fiuj? s È fiui 2 
Gi) of =h [2 (e ) | 


v |G) | 


o = 88.52(verify). 


88.52 
278.47 


(iv) C.V = x 100 


=31.79 


Problem 4. Find the arithmetic mean x, standard deviation o and 
percentage of case within X +o, X+ 2o and x + 3o in the following 
frequency distribution. 


Marks 10 |9 8 7 |6 5 4 3° E2 1 


Frequency | 1 | 5 11 |15|12 |7 3 3 |0 1 
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Solution. 

Xi fi fi Xi fx 
10 1 10 100 
9 3 45 405 
8 11 88 704 
7 15 105 735 
6 12 72 432 
5 7 35 175 
4 3 12 48 
3 2 6 18 
2 0 0 0 
1 1 1 1 

Total 57 374 2618 
guber 6 5G 

N 57 


ape- ew 


_ 2618 es) _ 2618x57-374? 9350 


57 57 572 572 


O= (=)v9350 =1.7 (approximately) 
Now, x + o = 6.56 + 1.7 = 8.26,4.86 
There are 45 items [7+12+15+11] which lie within 4.86 and 8.26 


.. Percentage of cases lying within the range xto= e x 100 = 79% 


Now x + 20 = 6.56 + 3.4 = 9.6,3.16 
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There are only 53 items [3+7+12+15+11+5] which is within 3.16 


and 9.96 
..Percentage of items lying within the range x t 2c 
== x 100-939; 
NOTES 
Similarly the percentage of items lying within the range 
x t 3c 


is 98% (verify). 


Problem 5. Mean and standard deviation of the marks of two classes of 
sizes 25 and 75 are given below. 


Class A Class B 
Mean 80 85 
S.D 15 20 


calculate the combined mean and standard deviation of the marks of the 
students of the two classes. Which class is performing a consistent 
progress? 


Solution. Let x and o be the mean and standard deviation of the 
combined classes. 


Given X, = 80;x; = 85; 04 = 15;0, = 20; n, = 25; n; = 75 
niX*n;X; 25 X80475x85 8375 


Sx athe LAT -83.15 


n44n4 100 100 
Now, d, = X4 — x =80-83.75=- 3.75 
d, =X, — x =85-83.75=1.25 
We have, o? = [n, (o2 + d2) + nz (023 + d2)] 

oO = — [25 x 152x75 x20? + 25(—3.75)? + 75(1.25)?] 

= [5625 + 30000 + 351.5625 + 117.1875] 

=360.9375 
-. 0219 (approximately). 


C.V of marks of class A-— x 100 = = x 100 
1 


= 18.75 
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C.V of marks of class B-22 x 100 = = x 100 = 23.53 


X2 


Since the C.V of marks of class A is smaller than that of class B , class A 
is performing consistent progress. 


Problem 6. Prove that for any discrete distribution standard deviation is 
not less than the mean deviation from mean. 


Solution. Let m=mean deviation from mean. 
- m2 (4/N)[Y f| x; — x1] 
We have to prove o not less than m. 
(i.e) to prove that o? > m? 
2 
Now, o? > m?e (1/N) X fix; — 3)? = [G/N) X fil x; — xl] 
e a/N)Xfz?2 [(1/N) X fizi ]? where z; = |x; — xl 


o (1/N)[¥ fiz? — (X fiz)?] e o2 = 0 which is true. 
Hence the result. 


Problem 7. The scores of two cricketers A and B in 10 innings are given 
below. Find who is a better run getter and who is more consistent player. 


A scores x; | 40 | 25 | 19 | 80 | 38 | 8 67 | 121 | 66 | 76 


B scores y; | 28 | 70] 31 |}0 | 14} 111 |66 | 31 | 25)4 


540 


Solution. For cricketer A: X = PT 54 
For cricketer B: y —— — 38 
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x |x-X| (x-x" Yi yi-¥ | Gi- y» 
40 -14 196 28 -10 100 
25 -29 841 70 32 1024 NOTES 
19 -35 1225 31 -7 49 

80 26 676 0 -38 1444 
38 -16 256 14 -24 576 

8 -46 2116 111 73 5329 
67 13 169 66 28 784 
121 67 4489 31 -7 49 

66 12 144 25 -13 169 
76 22 484 4 -34 1156 

Total - 10596 Total - 10680 


1/2 
o,7 [(1/N) Xx 3)2]172 =|=] = V1059.6 =32.55 
1/2 
Similarly oy= [(1/N) Xj .?]'? =|=] =V1068 
= 32.68 
C.V of A = (Œ) x 100 = ŽŽ x 100 = 60.28 
X 54 
C.V of B = (2) x 100 = =" x 100 = 86 
y 38 
Since X > y cricketer A is better run getter. C.V of A< C.V of 
B, Cricketer A is also a consistent player. 


Problem 8. The mean and standard deviation of 200 items are 
found to be 60 and 20. If at the time of calculation two items are 
wrongly taken as 3 and 67 instead of 13 and 17, Find the correct 
mean and standard deviation. 

Solution. Here n=200; x = 60; o = 20 


3 = 60 => i= 60 
200 
~.J xi = 12000 
Corrected Y x; = 12000 — (3 + 67) + (13 + 17) = 11960 
11960 


= 59.8 


200 


-. Corrected X = 
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?.Xx* (Xuy 2- zur 2 
EE (=) . Hence 20 — (60) 


«Y x? = 200(20? + 602) = 300000 
After correction ©; xj? = 800000 — (3? + 67?) + (132 +177) = 
795960 
"Corrected o? = D (59.8)? NOTES 


<. o =20.09 


Problem9. Find (i) the mean deviation from the mean (ii) variance of 
the arithmetic progression a, a+d, a+2d, ...... , a-2nd. 
Solution. There are 2n +1 terms in the A.P 


4 pa ea -+ (a+ 2nd)] 
_ 2n(2n+1) 
"ule + Da + af | 
=a+nd 
(i) Mean deviation from mean = LIE Ix; -x| 
=> i pd 24- :-n)| 
_ n(n+1)d 
^. 2n*1 
(iii) Variance o? — ;— x)? 
2(42 2 2 
NET (1^ t 2^ +- 4 n*)] 
cd ; pese 
2n+1 6 


==n(n + 1)d? 


Exercises. 

1. Calculate mean, S.D and C.V of the marks obtained by 20 
students in an examination. 
62 85 73 81 74 58 66 72 54 84 
65 50 83 62 85 52 80 86 71 75 


2.Calculate the standard deviation from the following data of 
income of 10 employees of a firm. 
100 120 140 120 180 175 185 130 2200 150 


3. Prepare a frequency table from the following passage taking 
consonants and vowels in each word as two variable x and y. Find 
X, y, Ox andoy 
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4. Calculate the mean deviation from (i) mean (ii) median (iii) 
mode for the following data. 


Size of item Frequency Size of item Frequency 
3-4 3 7-8 85 
4-5 7 8-0 32 
5-6 22 9-10 8 
6-7 60 Total 217 


5.Find the standard deviation of the following heights of 100 


male students. 


Height of 60-62 | 63-65 66-68 69-71 72-74 
inches 
No. of students 5 18 42 27 8 
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UNIT-IV MOMENTS SKEWNESS AND 
KURTOSIS 


4.1 INTRODUCTION 


In previous chapters we have introduced certain measures of 
central tendencies and measures of dispersion with the aim of finding a " 
few statistical constants" that represent the entire data. In this chapter we 
introduce some more statistical constants known as moments. 


4.2.MOMENTS 
Definition. The r” moment about any point A, denoted by yur of a 
frequency distribution (f/xi) is defined by u, = — 


When A = 0 We get u, == —— which is the r" moment about the 
origin. 
The r" moment about the arithmetic mean X of a frequency 
Y fixi - X)" 


distribution is given by ur = m 


Ur is also called the 1" central moment. 


Note 1. The first moment about origin coincides with the A.M of the 
frequency distribution and u2is nothing but the variance of the frequency 
distribution. 


Xf; Gu -X) 


Note 2. u -= 0; 

Note 3. 4, 250 -9 2c A [pene i m 2A -X-A 
XA 

We now establish a relation between y; and u; 

Theorem 4.1 

lero rg Hi Eo Dead) ee *CD *  G-DCuD' 


Proof. u,-1/N Xf; (x; — D 
= 1/N Xf, (x; -A+A-— x)" 
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-1/NXfi (x; — A — d)" where d=x-A 
Moments Skewness and 
SI/NDEf Qi -A-ra dE fi rod i ea A = Kurtosis 
«e +rcr-1-dr-1fi xi — Ar-1p1 77c7(—4)rfi] 


= ut Te, durs + Tey App oss HEL) raD) «(71)7d NOTES 


Sie Tec Desa We Te Bey see +(-1)*! G-DCuD' 


Note. Putting r = 2, 3, 4 in the above theorem we have 


(i) uz = p3 + (m)? 
(ii) p3 = u$ — 3u2ts' + 20 
Gii) 4 = u4 — pts" + 63 uu)? — 307* 
Theorem 4.2 y - uy rc, Hr-1 Hi + rc, body, eese +( u4)" 
Proof. w = 1/NXf; (x, — A)* 
= 1/N Xf, (x; —X-T-X—A) 
-1/NXf; X; —x+d)" where d = X — A = yp, 
-1/NXf, [Gi = X) +ro (xi = X)" td+re, (x — xT 2d? Tec d^] 


= Je Te, Mya H4 Te, Heat eee tO 
Note. Putting r 22, 3, 4 in the above theorem and using u, = 0 we have. 


G) By = We + Cy’)? 
(i) us = us + 32h’ + (4) 
(ii) u4 = p4 + 4usta! + Opty (n4? + (m14 
Note. When the variable x; are changed into another variable u; where 
Xj-A th " E à 
u= the r^" moment y, of the variable x; is given by 
u= h" [: fj(uj-u) 
Thus the r" moment ur of the variable x; is h” times the r^ 
moment of the variable uj 
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Definition. Karl Pearson's f$ and y coefficients are defined as follows. 


2 
" " 
Bis and fj; => 
2 H2 


Yı = J/ f, and Ya - B; — 3 


The above four coefficients depends upon the first four central 
moments. They are pure numbers independent of units in which the 
variable x; is expresses. Also their values are not affected by change of 
origin and scale. These constants are used in section 4.3 in the study of 
skewness and kurtosis. 


4.3 SKEWNESS AND KURTOSIS 


If the values of a variable x; are distributed symmetrically about 
the mean which is taken as the origin then for every positive value of x -X 
there corresponds a negative equal value. Hence when these values are 
clubed they retain their signs and cancel on addition. 


NETT -x vf, (x; — 3) =0. Hence p= =0 

Thus in the case of symmetrical distribution fails to be 
symmetrical (asymmetrical) then we say that it is skewed distribution., 
Thus skewness means lack of symmetry. From the above discussion we 
see that 6, can be taken as a measure of skewness. We say that a 
frequency distribution has positive skewness if f4»0 and negative 
skewness if B,< 0 


For a symmetric distribution the mean ,median and mode 
coincide. Hence for an asymmetrical distribution the distance between 
the median and mean may be used as measures of skewness. 


-. Mean -Mode and Mean - Median may be taken as measures of 
skewness. 


To make these measures free from units of measurements so that 
comparison with other distribution may be possible we divide them by a 
suitable measure of dispersion and obtain the following coefficients of 
skewness. 


(i) Karl Person's coefficient of skewness. 


Mean-Mode  3(Mean-Mode) 


" " are called Karl Pearson's coefficients 


of skewness. 
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(ii)Bowley's coefficient of skewness is given by 


Kurtosis 


Definition. Kurtosis is the degree of peakedness of a distribution usually 
taken relative to a normal distribution. It is measured by the coefficient 


p; 


Q3+ Qı- 2 Median 


or (Y2 < 0) and such a curve is known as platykurtic. 


Q3- Q4 


B; > 3 or (y; > 0) and such a cure is known as leptokurtic. 


Solved problems. 


Problem 1.Calculate the first four central moments from the following 


For a normal curve p, = 3 or (y; = 0) messokurtic. 
For a curve which is flater than the normal curve f; « 3 


For a curve which is more peaked than the normal curve 


data to find £4 and f; and discuss the nature of the distribution. 


X 0 1 2 3 4 5 6 
y 5 15 17 25 19 14 5 
Solution. 
Here X = zis = TL - 
Choosing ui = xi- X = xi - 3 we have the following table. 
Xj fi ui fi ui hus Lu fu; 
0 3 -3 -15 45 -135 405 
1 15 -2 -30 60 -120 240 
2 17 -1 -17 17 -17 17 
3 25 0 0 0 0 0 
4 19 1 19 19 19 19 
5 14 2 28 56 112 224 
6 5 3 15 45 135 405 
Total | 100 - 0 242 -6 1310 
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T 
ary 
| 


=1/N) f eee) 0 
u= 1/NXf x; — 3)? = = 2.42 


= 6 
is = 1/N > f (x; xe 906 
a 1810 
M, — 1/N D f (x; X) —100 - 13.10 
_ 43 _ (000? _ .0036 _ 
Bi = ij^ 2423 ^ 141725 0.0003 


u4 _ 13.10 1310 


pB2-2- 


z =F = 2.237 
uà 2.42 5.8564 


Since f, > 0 the distribution is positively skewed. 
Since Bz = 2.237 < 3 the distribution is platykurtic. 


Problem 2. Calculate the values of 6, and f; for the distribution. 


Solution. Taking u; — T we get the following table. 

Xi fi ui f; ui fur fu; fu; 
04.5 11 -2 -22 44 -88 176 
14.5 | 20 -1 -20 20 -20 20 
24.5 16 0 0 2 0 0 
34.5 | 36 1 36 36 36 36 
44.5 17 2 24 68 136 272 
Total | 100 0 28 168 64 504 


Here we have chosen A=24.5 and h=10 
w —1/NXf (x, — A) -lXfu xh= = x 10 2 238 


] 1 168 
H2 =<} fiu” x h? ETT 107 = 168 


-iyi 3 xh? = = x 10° = 640 
NL ~ 100 ~ 


-Yf 4 x ht a ety 104 = 50400 
H Fg i Uj ^ 100 TT 
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Now, p = 0 
u> = u% + (i)? = 168 — (2.8)?-160.1 


Us = u — 3u5u,’ + 2(u,’)? = 640 — 3 x 168 x2.84-2(2.8)? 
= -727.296 

Ma = u4 — 4usis! + 62 (Hy)? — 3(u* 
= 50400-4 x640x2.84-6x168x(2.8)?-3(2.8)* 
= 50950.323 


2 
Now, f, = "i =0.129(verify) 


p; = a 1.986(verify) 
2 


Problem 3. The first four moments of a distribution about x=2 are 1, 


2.5, 5.5 and 16. Calculate the four moments (i) about the mean. (ii) about 


the zero. 
Solution. Given ų = 1; u = 2.5; u4 = 5.5; uy = 16 where A=2 


(i) Moments about mean. 


Hy =0 
u2 = utu) =25-1=15 
us = u — 35314" + 2(p,’)2=5.5 - 3x2.5+2=0 


u4 = u4 — Aust! + 6u5 uu)? — 3()* 
-16-4x5.546x2.5-3—6. 


Gi) Moments about Zero. 
We have X = A + | (refer Note 3 in 4.1) 


2241-23 
Now the first moment about zero uj =1/N Xf; (x; — 0) 
Now, u% = p; + (p, )?2 1.543? = 10 
u3 = H3 + 3241" + (uu ')3=0+3x1.5+x3+3? = 40.5 
H4 = pa + Apis! + 6057?  ()* 
= 6+(4x0x3)+(6 x 15 x 3?) +34 = 168 
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Moments Skewness and 


Problem 4. The first three moments about the origin are given by Kurtosis 


pa’ 220 D; ph = 0 D) 2n + 1); ph = 2n(n + 1)?. Examine 


the skewness of the distribution. 


NOTES 
Solution. p3 = p3 — 3ious' + 2(u? 


2in(n-1)?-3xi(402Qn-220-71-42E5GQ-« Dj 
==(n + 1)?[n-(2n+1)+(n+1)] 
= (n + 1)?x0=0 

ia = $7 (a)? n+ Dn 2 - B [| 
-in-0ÉQn«D-10-1] 


mde. 
—5 il 1) 
u2 #O0ifn#+1 


^ When n>1, f = 0 
Hence the distribution is symmetric. 


Problem 5. For a frequency distribution (fi/xi) show that f; > 1 


Solution. We have f; — re 
2 


To prove f; > 1 it is enough to prove that u4 > p% 


SAO - D2 [EA G-3?] 
NOW eS ea 


Xfizi YF zi)? 22 
SEN qe where z;-(x, — X) 


Hence f;21 
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Exe 


rcises. 


1. For the following data calculate the Karl Pearson's co efficient 


of skewness. 


(i) 
Wages in | 10 11 12 13 14 15 
Rs. 
Frequency 4 10 8 5 1 
(ii) 
Size 7 8 9 10 11 12 
Frequency 6 9 13 8 5 4 


2. Find Karl Pearson's coefficient of skewness for the following data. 


M Age Students Age Students 
10-12 4 18-20 20 
12-14 10 20-22 14 
14-16 16 22-24 6 
16-18 30 Total 100 
(i) 
Wage No. of Workers Wage No. of Workers 
Above Rs.5 120 Above Rs.55 58 
Above Rs.15 105 Above Rs.65 42 
Above Rs.25 96 Above Rs.75 12 
Above Rs.35 85 Above Rs.85 0 
Above Rs.45 72 Total 590 


is 8 and mean 30. Find the mode and median. 


4. Calculate the first four moments of the following distribution about the 
Find B, and fg; and hence comment on the nature of the 


3. Karl Pearson's coefficient of skewness of a distribution is 0.4, its S.D 


mean. 
distribution. 
xX | 0 1 2 3 5 6 7 8 
Yi 1l 8 28 56 70 56 28 8 1 
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UNIT-V CURVE CUTTING 


5.1 Introduction 


So far we have introduced several statistical constant like 
measures of central tendencies, measures of dispersion and measures of 
skewness and kurtosis in order to characterize a given set of sample data 
drawn from a population. Another important and useful method 
employed to understand the parent population is to discover a functional 
relationship between the variable comprising the sample data. 


Let x; where i= 1,2,3,......n be the values of the dependent 
variables y; If the points (xj, yi) ,i=1,2,.....n are plotted on a graph paper 
and we obtain a diagram called scatter diagram . Hence if there is a 
functional relationship between xj and y;. The process of finding such 
a functional relationship between the variables is called curve fitting. 
Curve fitting is useful in the study of correlation and regression which 
will be dealt with in the next chapter. For example the lines of regression 
can be got by fitting a linear curve to a given bivariate distribution. The 
properties of the curve fitted to a given data can be used to know the 
properties of the parent population. 
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UNIT-VI PRINCIPLE OF LEAST 
SQUARES 


6.1 INTRODUCTION 


NOTES 


Among the many methods available for curve fitting the most popular 
method is the principle of least squares. Let (xj, yj; where i = 1,2,.....n be 
the observed set of the variables (x,y). Let y = f(x) be a functional 
relationship sought between the variables x and y . 


Then d; = yi — f(x;) which is the difference between the 
observed value of y and the determined by the functional relation is 
called the residuals. The principle of least squares states that the 
parameters involved in f(x) should be chosen in such a way that X; d? is 
minimum. 


6.2 Fitting a straight line 


Consider the fitting of the straight line y = ax + b to the data 
(xip y. , 1=1,2,....,0 


The residual d; is given by di- yi - (axi + b) 


-Xdé-X(y — ax — b)? = R(say). According to the 
principle of least squares we have to determine the parameters a and b so 
that R is minimum. 


*=0=>- 20 — ax; — b) x; =0 

o XOuy; — ax? — bx) = 0 
Say Xe DIRK VS leet (1) 
*=0=>-20(y; — ax; — b) =0 


~a xi+nb=} yi PE (2) 


Equations (1) and (2) are called normal equations from which a and b can 
be found. 


6.3 Fitting a second degree parabola. 


Consider the fitting of the parabola y = ax? + bx + c to the 
data (xj, Vi) where 1=1,2,......n. 


The residual d; is given by di= y;— (ax? + bx; + c) 
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«Yd? = X(yi- ax? — bx; — c)? = R(say) 


By the principle of least squares we have to determine the 
parameters a,b and c so that R is minimum. 


*=0=>- 200 — ax? — bx; — c) x7 =0 


=> Xx? y, — a} xf -b Xx’ -c}x? =0 


Say ee +b Yx? +e? SEx yi ues (1) 


T 0 => -2E(y; — ax? — bxi — c)x; =0 


da 
=> ) xpi -aX x? =by x? =c x) =0 
=> ax? + b Xx? +c¥xK = Xxyi .... (2) 
*=0=> -2X(y; — ax? — bx; - c)x;=0 
=> Xyi-aXxi-bXx -nc=0 
sax? + Dy Ae SY View ee (3) 
Equations (1), (2), and (3) are called normal equations from which a,b 


and c can be found. 


Note. If the given data is not in linear form it can be brought to linear 
form by some suitable transformations of variable. Then using the 
principle of least squares the curve of best fit can be achieved. 


Curves of the form (i) y =bx* (ii) y =ab* (iii) y= ae"* are of 
special interest which are dealt with here in solved problems. 


Solved Problems 


Problem]. Fit a straight line to the following data. 


X 0 1 2 3 4 


Y 2.1 3.5 5.4 7.3 8.2 


Solution. Let the straight line to be fitted to the data be y =ax + b 


Then the parameters a and b are got from the normal 
equations. 


X xi =a} xi +nb 


È xiyi=a} x? +b LX; 
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Principle of Least squares 


Xi yi Xiyi xi 
0 2.1 0 0 
1 3.5 3.5 1 
2 5.4 10.8 4 NOTES 
3 7.3 21.9 9 
4 8.2 32.8 16 
Total 26.5 69.0 30 


Hence the normal equations are 
10a + 5b = 26.5 ...... (1) 
30a + 10b 2 69 ............. (2) 
Solving (1) and (2) we get a=1.6 and b=2.1 
.. The straight line fitted for the is y = 1.6x + 2.1 


Problem 2. Fit a straight line to the following data and estimate the value 
of y corresponding to x= 6 


X 0 5 10 15 20 25 
Y 12 15 17 22 24 30 
Solution. 


Take u; = “(Xi - 15) and vi = y;-22 
Let v = au + b be the straight line to be fitted. 


We get the following normal equations to get the parameters a and b. 
Then the normal equations are. 


Vivi =a Yu; + nb 


X uivi =a} u? + by ui 
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Xi yi üi E: mes u? Principle of Least squares 
0 12 -3 -10 30 9 
5 15 -2 -7 14 4 
10 17 -1 -5 5 1 NOTES 
15 22 0 0 0 0 
20 24 1 2 2 1 
25 30 2 8 16 4 
Total - -3 -12 67 19 


-. The normal equations are 
-3a + 6b=-12 ^  ....... (1) 
19a3—3b267 ^  ..... (2) 


Solving for a and b we get a=3.49 and b = -0.26 


x-15 
5 


.. The straight line to be fitted becomes y — 22 -3.49( ) - 0.26 


«^ Sy -110= 3.49x — 52.35 -1.30 

“Sy= 3.49x + 56.35 

sy = .698x + 11.27 

Now for x = 6 the estimated value of y is y=.698 X 6 +11.27 = 15.458 


Problem 3. Fit a second degree parabola by taking x; as the 
independent variable. 


X 0 1 2 3 4 
Y 1 5 10 22 38 
Solution 


Let the second parabola to be fitted to the data be y 

Y - ax? +bx +c Then we have the normal equations to find a,b,c. 
aX xj + bY xi ecXxi- xi yi 

ay, x? + by x? +c Vi = DVR yi 
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aùx? + bXxuanosEy 


Principle of Least squares 


Xi yi Xi Ji xi Xi Yi xi xi 
0 1 0 0 0 0 0 
1 5 5 1 5 1 1 NOTES 
2 10 20 4 40 8 16 
3 22 66 9 198 27 8l 
4 38 152 16 608 64 256 
Total | 76 243 30 851 100 354 
10 
Now , the normal equations become 

354a + 100b + 30c 2851 .-  ....... (1) 

100a + 30b + 10c =243 ^  — ......... (2) 

30a + 10b + 5c 276 aens (3) 


Solving for a,b and c we get a= 2.21 ; b = 0.26 and c = 1.42(verify) 
-. The second degree parabola is y = 2.21 x? 4 0.26 x + 1.42 


Problem 4. Fit the curve y = bx" to the following data. 


X 


1 


2 


3 


4 


Y 


1200 


900 


600 


200 


110 


50 


Solution. y = bx* 
“log y = a log x + log b 
Let log y = Y and log x = X 


Then the curve is transformed into Y = AX + B where A = a and B = log 
b. Hench the normal equations now become 


YY-AYX +nB 
YXY =AYx2+BYX 
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X Y X Y XY x Principle of Least squares 
1 1200 0 3.0792 0 0 
Í 900 0.3010 | 2.9542 0.889 | 0.091 
3 600 04771 2.7182 1325 | 0228 
NOTES 
4 200 0.6021 2.3010 1.385 | 0.363 
5 110 0.6990 | 2.0414 1427 | 0.489 
6 50 0.7782 1.6990 1.22 | 0.606 
Total : 2.8574 | 14.8530 | 6.348 | 1.777 


<. The normal equations are 
2.9 A + 6 B = 14.9 approximately 
1.8 A + 2.9 B = 6.6 approximately 
ʻ A =- 23 and b = 3.6 (verify) 
" A=a=-2.3 and B = log b = 3.6 
^ a = -2.3 and b = antilog 3.6 = 3981 
-. The required equation to the curve is y = 3981 xc? 


Problem 5. Explain the method of fitting the curve of good fit y = ae 
(a>0) 


Solution. y=ae™ nn (1) 
“log y = log a + bx loge  ...... (2) 
Let Y = log y; B = log a ; A =b loge 
-. (2) between y = Ax + B 
This is linear equations in x and y whose normal equations are, 
Xxy;2AXxP-BXx 
Xyi2AXx; -nB 


From the two normal equations we can get the values of A 
and B and consequently a and b be obtained form a = antilog (B) and 


b TT. Thus the curve of best fit (1) can be obtained. 
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Problem 6. Explain the method of fitting the curve y = Ka™ (a,k>0) Principle of Least squares 
Obtaining the normal equations by the method of least squares. 


Solution. The curve can be transferred to the form of a straight line as 
follows. 


Log y = log k + b (log a) x ; (a,k > 0) NOTES 
Let log y = Y; log k =B ; b loga = A 
Hence the above equations takes the form Y = Ax + B 


By the principle of least squares the normal equations to find 
A and B of the above straight line are 


xyi =A LX +BY x 
X yi =A} xi +nB 


After finding the values of A and B from the normal equations we can 
obtain the value of k,a and b hence the curve y = k a™ to the following 
data. 


Problem 7. Fita curve of the form y = ab^ to the following data. 


Year (x) | 1951 | 1952 | 1953 | 1954 | 1955 | 1956 | 1957 


Production | 201 263 314 395 427 504 612 
in tons (y) 


Solution. y ab . . .  ..... (1) 
“logy=logat+logb  ........ (2) 
Let log = Y ; log a = B and log b = A 
-. (2) becomes Y = AX + B .......... (3) where X = x — 1954. 
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x Y X -x-— 1954 | Y -logy XY x? Principle of Least squares 
1951 201 -3 2,3032 -6.9096 9 
1952 263 -2 2.4200 -4.8400 4 
1953 314 -1 2.4969 -2.4969 1 NOTES 
1954 395 0 2.5966 0 0 
1955 427 1 2.6304 2.6304 1 
1956 504 2 2.7024 5.4048 4 
1957 612 3 2.7868 8.3604 9 
Total 0 17.9363 2.1491 28 


The normal equations for (3) are 
XYXY =AYx*+BYX 
XY-AXX +nB 
28A =2.1491 ih Gane: (4) 
7B = 17.9363 — cuneos (5) 
Solving the above equation we get A = 0.0768 B = 2.5623 
* b = antilog A = antilog 0.0768 = 1.19 (approximately) 
A = antilog B = 2.5623 = 365.01 (approximately) 
^. The curve of good fit is y = (365.01)(1.19)* 
= (365.01)(1.19)* 9 
Exercises 


].Fit a straight line to the following data regarding x as the 
independent variable. 


(1) 


X 0 1 2 3 4 


Y 1 1.8 3.3 4.5 6.3 
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(i) 


Principle of Least squares 


Yearx 1911 1921 1931 1941 1951 
Production 10 12 8 10 14 
in tons y 
Also estimate the production in 1936. NOTES 


2.Fit a second degree parabola to the following data taking x as the 
independent variable. 


G) 
X 0 1 2 3 4 


Y 1 1.8 1.3 2.5 23 


i) 


3.Fit a curve y = ax” for the following data. 


X 1 2 3 4 5 6 


Y 14 27 40 55 68 300 


4.Fit a curve y = ax" forthe following data. 


X 1 2 3 4 


Y | 2.99 | 4.25 | 5.22 | 6.10 


5.Fit the exponential curve y = ae™ tothe following data. 


X 0 2 4 


Y 50.2 10 31.62 
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UNIT-VII CORRELATION 


7.1 INTRODUCTION 


In statistical we have studied the methods of classifying 
and analysing data relative to single variable. However data presenting 
two sets of related observations may arise in many fields of activities 
giving n pairs of corresponding observations (x, yj );i=1,2,..., n 


For example, (1) x; may represent height and y; weight 
of a colletion of students. (11) x; may represent price of a commodity and 
y; the corresponding demand. Such a data (xi, yj); i=1,2,..., n is called a 
bivariate data. 


7.2 CORRELATION 


Definition. Consider a set if bivariate data (xi, y;);i=1,2,..., n. If there is 
a change in one variable corresponding to change in the other variable 
we Say that the variables are correlated. 


If the two variables deviate in the same direction the 
correlation is said to be direct or positive. If they always deviate in the 
opposite direction the correlation is said to be inverse or negative. If the 
change in one variable corresponds to a proportional change in the other 
variable then the correlation is said to be Perfect. 


Height and weight of a batch of students; Income and 
expenditure of a family are examples of variables with positive 
correlation. 


Price and demand; volume v and pressure p of a perfect 
gas which obeys the law pv=k where k is a constant, are examples of 
variables with negative correlation. 


Definition. Karl Pereson's coefficient or correlation between the 
_ E&i- 


variables x and y is defined by Nay ae 
Xvy 


where x,y are the 


arithmetic means and Ox, oy the standard deviations of the variables x 
and y respectively. 


Definition. | The Covariance between x and y is defined by 
XGi-3)0yi-Y) cov(x,y) 
n 


cov(x,y)- y: 
xOy 


Hence e 
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Example. The heights and weights of five students are given below. congas 


Height in c.m. | 160 | 161 | 162 | 163 | 164 

x 

Weight in kgs. | 50 | 53 | 54 56 | 57 

Y NOTES 


Hence x = 162; y = 54; Ox = V2 and Oy = V6 (verify) 


Now 


2,65 -X)i-y)-(C-204-*(-0)(710 + 0 + (1 x 2) + (2 x 3) 


=17 
_ XGu-3Xi-y) 
i Vay B nOxOy 
_ 17 — 17x412  17x346 _ 
^ 5J2/6 60 60 0.98 
Y xiyi-2 Xii 
Theorem 7.1. = - : 
Ir lnzxi- Gy] Ey? ye” 
. Eou-3061i-Y) 
Proof. ka Boc c co ANM ZZ S (1) 


Now, d(x — X)(yi — y)-Xxiyi — XX yi - Y È xi + nxy 
= x(ny) — y(nx) + nxy 
=}, Xiyi — nxy 


= DXi -(DzxXyx 


(3) [ui yp Xp X M]. sateen: (2) 


Also, d c = Yi (x; — x)? 


= [Exp — 28 Ex + n(@)?] 
= lxi -2nG) +n] 
-l [x x? — (-) Q x)? 


=< [nd x? - (2 x)7] 
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Correlation 


Leen eo DAY Actes: (3) 
Similarly, oy =+ [n Dy? - QD? s (4) 
Substituting (2),(3) and (4) in (1) we get the required result. NOTES 


The calculation of yy, may frequently be simplified by making 
use of the following theorem. 


Theorem 7.2 The correlation coefficient is independent of the change of 
origin and scale. 


Proof. Let u; — x and v = nE where h, k > 0. 
“xX; = A+ hu; and y; = B+ kv;. 
Hence X = A+ hu and y = B + ky 
^X;— X = h(u; -u) and y; — y = k(v; — v) 
Also oy = ho, and oy = ko, 


- X-XXyi-y) — hk Xu;-u)(vi-v) 
` xy Nox Oy n(ho, )(koy) 


z X(u;-u)(vi-v) 


n oy oy 
= Yav 


Hence eg Tay 


Theorem 7.3 -1 2 y 1 


X6u-3X(yi-y) 
Proof. Yzy = Eo 


(=)@i-Di-Y) 
[cs] Eo] 


Let aj = x; — X and bj = yj — y 


2 Œ aibi)? 
X 


Yay (Xa )(xbj) 


By Schwartz inequality we have (X ajb;)? < (X a )(X bi^) 
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Notel. If y = 1 the correlation is perfect and positive 
Note 2. If y= -1 the correlation is perfect and negative 


Note 3. If y= 0 the variables are uncorrelated. 


The following theorem gives another formula for Yzy interms y. and Yy 


2 22 2 
Theorem 7.4 y. = oxtoy-(ox-y)" 


2 OxOy 


_ Xleu-y0- G-)P 


Proof. (o, y)? - 


X[Gu 3 Gi-Y)l? 


n 


= F-D 230 -3)0(05:—3)9 Gi - 3] 


-a2 2 
zo. 21 py OxOy toy 


= c2 to?-(ax-y)? 


r Yxy 2 OxOy 


Solved Problems. 


Problem 1. Ten students obtained the following percentage of marks in 
the college internal test (x) and in the final university examination(y). 
Find the correlation coefficient between the marks of the two tests. 


Note 4. If the variables x and y are uncorrelated then Cov(x,y)- 0. 


X51 |63 |63 |49 |50 |60 |65 |63 |46 


50 


y 149 |72 |75 |50 |48 |60 |70 |48 |60 


56 


62 
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NOTES 


self - Instructional Material 


Solution. Choosing the origin A= 63 for the variable x and B= 60 for y Correlation 
and taking uji = x; — A and v; = y; — B we have the following table. 


Xi Ui Yi Vi ui Vi UjVj 
51 -12 49 -11 144 121 132 
63 0 72 12 0 144 0 NOTES 
63 0 75 15 0 225 0 
49 -]4 50 -10 196 100 140 
50 -13 48 -12 169 144 156 
60 -3 60 0 9 0 0 
65 2 70 10 4 100 20 
63 0 48 -12 0 144 0 
46 -17 60 0 289 0 0 
50 -13 56 -4 169 16 52 
Total | -70 - -12 980 994 200 


Yay Yuy (by theorem 7.2) 


n X ujvi-2uivi 


[n Xu - Qi up2] " [n v? -Q: v7] 


1/2 


10x500-(—70)x(-12) 
[10x980—(702)]1/2[10x994—(122)]1/2 


_ 4160 
^ 70x98.97 


= 0.6 (verify) 


Problem 2. If x and y are two variables prove that the correlation 


coefficient between ax+b and cy+d is y ifa,c #0. 


= ac 
axtb,cy+d ^ [acl Yzy 
Proof. Let u= ax+b and v = cy+d 


u= ax+b and v = cy+d 
cà = J(u - y=" 5 (xi — 3)? = a? 02 
u n "n 1 X 


- 2 
Similarly, o7 = c^oy 
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| Yu-w?(v-v)? _ La(x-xX)c(y-¥) 


Now = n E d 
» Tuv ioy aladi Sky Correlation 


= ac 
= Tacl "xy 


Problem 3. A programmer while writing a program for correlation 

coefficient between two variables x and y from 30 pairs of observations NOTES 
obtained the following results x = 300; Xx? = 3718 

Xy-210Yy? = 2000 X xy = 2100 


At the time of checking it was found that he had copied down two pairs 
(x1, ¥j)as(18,20) and (12,10) instead of the correct values (10,15) and 
(20,15). obtain the correct value of the correlation coefficient. 


Solution. Corrected 9x = 300 — 18 — 12 + 10 + 20 = 300 
Corrected }} y = 210 — 20 — 10 + 15 + 15 = 210 
Corrected Y x? = 3718 — 18? — 12? + 10? + 20? = 3750 
Corrected Y y? = 2000 — 20? — 10? + 15? + 15? = 1950 


Corrected); xy = 2100 — (18 x 20) — (12 x 10) + (10 x 15) + 
(20 x 15) = 2070 
After correction the correlation coefficient is 
n nYXxy-Xxy 
Yay mE- Ox)? In Xy? yy? 


' " 30x2070—300x210 
yy (112500—90000)1/2(58500—44100)1/2 


—900 900 1 


^ (22500)2/2(14400)1/2  Á 150x120 20 


= -0.05 
Problem 4. If x and y are uncorrelated variables each having same 
standard deviation obtain the coefficient of correlation between x+y and 
y+z. 
Solution. Given o, = oy = o; — o (say) 
x and y are uncorrelated =>}, (x —x)(y- y) = 0 


y and z are uncorrelated =>} (y — y)(z—z) = 0 


z and x are uncorrelated =>} (z —z)(x—x) = 0 
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Correlation 


Let u=x+y and v=y+z. 
<. uzX*y and V=y+z 


Now, of = 2X(u- D? = iSi- + (y - YP? 
NOTES 


-[Z&-3* -XG-3Y 2XG-3)(69-y) 
=o; +0% (sinceX(x—-x)(y-y)-0 
=20° 


Similarly o2 =20? 


Now, X(u — u)(v — v) 
-Xlix - 3) * G - Nt -Y? +z- 


-Xx-3(--XG-»'-Xz-2G-3-Xc-2G-3 


= 0+ noy*+0+0= no? 


. (u-u) v-v) — no? 1 
B T n(20*) 2 


t Yav noyoy 


Problem 5. Show that the variables u=xcos æ + y sina and v=y cosa- x 
. . Of? xyOxSy 
sina are uncorrelated if o-tan ! (295) 
Ox— oy 
Solution. u; = x;cosa + y; sin a and v; = y;cosa — x; sina 


“.uU = Xcosa + ysina and v = ycosa — xsina 


uj — U = (Xj — X) cos at(y; — y) sina 


The variables uj and v; are correlated if X (u; — u)(v; — v) = 0 


X) cosa + (y; — y) sina] [(y; — y)cosa — (x; — X)sin a] = 0 


“LLG — 


-X6u- X) Qi- y) cos?a — X(x; — X) (yi — y) sina. 
—cosa sino[Y (xi — x)! — XM(yi — y)7|=0 


DY yy Ox Fy (COS* aL — sin?a)= n cos a sin a(o% — 07) 
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2 OxXO 
-. tan 20= n a 
Ox— Oy 
1 = (=) 
..Q= -tan — a 
2 Ox— Oy 
Problem 6. Show that if X' and Y' are the deviations of the random NOTES 


variables X and Y from their respective means then 


: 1 Xi’ Xi Yi 
O1- E 2 


n2 2 
Yi TON 1 Ai 
m E» . (ii)y=-1+ = X ( ) . Deduce that 
-Izysl 


Ox ov 


Solution. (i) Given that X'; = X; — X and Y'; = Y; — Y. 


ar Barge tes tet) 


Ox oy Ox oy 


X? OSA Sa Ree cp 
--x(x Qü-X) «rS » 4 2208-300 2) 
Y 


Ox 2 OxOY 


-L [E 4 a] 


2 oy? 
-L- [N + N — 2yN] = 1 — [2N — 2yN] 


= 1-(1-y)= 


(ii) can similarly be proved 


: Xj Yi m A 1 X;' Yi : 
Since }} (—— —] is always positive we have —-X(—-— —) is 
Ox oy 2N 


positive. 


Hence iy (& — "y <1 


2N Ox oy 
^. By (1) y< 1 Similarly by(ii) -1< y 


Hence -1<y<1 


Problem 7. Let x, y be two variables with standard deviation oy and oy 


respectively. If u=x+ky and v= x+(=)y and y,,,, = O(ie u and v are 
Y 
uncorrelated) find the value of k 
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Solution. u=x+ky => ū = X + ky 
“.u-U = (x — X) + k(y - y) and v-v = (x — X) + (=) (y-y) 
Now, y, = 0 => Cov(u,v) = 0 
-»»(u-u)v-v)20 
>k- *ky-1[e-» +(S)G-y]=0 
XG-3!-k(S)XxG-$»?-kXG-36-) 
«(2)xo-3290-»-0 
=> no2 + nk (=) o2+ny_ Oxo (k + x) =0 
x oy y xy X^y Oy 
=> no, + lo. + ko, + Yzy (Koy + 6x) |=0 
=> no, |(o. + ko, )(1 + Yay) | = 0 
=> om (om + koy) (1 t Yay) =0 
=>o, + ko, = 0 OF Yy + 1=O0oro, =0 
If nac —1and o, #0 we get k= - (oy/oy) 


Exercises. 


1. Find the correlation coefficient for the following data. 
(i) X 10 12 18 24 23 27 


Y 13 18 12 25 30 10 


GDX |20 |18 |16 |15 |14 |12 |12 |10 |8 


12 |14 |10 |14 |12 |10 |9 8 7 
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(iii) | Age of 23 27 |28 | 29 | 30) 31 | 33 135 | 36 | 39 
husband Correlation 
Age of 18 22 | 23 | 24 | 25 | 26 | 28 | 29 | 30 | 32 
wife 
NOTES 
7.33 RANK CORRELATION 


Suppose that a group of n individuals are arranged in the order of 
merit or efficiency with respect to some characteristics. Then the rank is 


a variable which takes only the values], 2, 3,....., n assuming that there is 


14+2+:-+n n+1 : : à 
Rim and the variance is given by 


no tie. Hence X — 


of = (v -1) 

Now suppose that the same individuals are ranked in two ways on 
the basis of different characteristics or by two different persons for a 
single characteristics . Let x; and y; be the ranks of the it? individual in 
the first and second ranking respectively. The coefficient of correlation 
between the ranks x; and yj is called the rank correlation coefficient and 
is denoted by p 


n 


6X(x-y)? 


Theorem 7.5. Rank correlation p is given by p=1- TIONIS 


Proof. Consider a collection of n individuals. Let x; and y; be the ranks 
of i™ individual 
in the two different rankings. 


“yx Dr —u 2: ES 2) = 22? 
“k= =yand ox = 5m’ — 1) = oy 
Now,X(x — y? = X[x - 3) - y - l? (sice X = y) 
=L- XQ 9y)^-—2x0-3-—y) 
= no; + no; -2npoyoy, 
= 2no£(1- p) (since oZ = oF) 
1 
=<n(n* — 1)(1—p) 
‘l-p= 6 U(x-y)? 
à n(n?-1) 


21$ Xc-y»? 
~~" n(n?-1) 
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Note. This known as Spearman's formula for rank correlation 
coefficient. 


Note 2. If two or more individuals get the same rank in the ranking 
process with respect to different characteristics then Spearman's formula 
for calculating the rank correlation will not apply since in this case X + y. 
In such case we assign a common rank to be repeated values. This 
common rank is the average of the ranks which these items would have 
assumed if their ranks were different from each other and the next item 
will get the rank next to the rank already assumed. As a result of this in 


the formula for p we add the factor = mm? — 1) to X(x — y)? where m 


is the number of times an item has repeated values. This correction factor 
is added for each repeated rank of the variables x and y. For example, 
after assigning rank 2 if four items get the same rank 3 then these fo ur 


items are given the common rank TE T 4-454 6) = 4.5 and the next 
item is given rank 7. In this case the correction factor to be added is 
LXAx(4-1)-5 

Problem 1. Find the rank correlation coefficient between the height in 
c.m and weight in kg of 6 soldiers in Indian Army. 


Height | 165 167 166 170 169 172 


Weight | 61 60 63.5 63 61.5 64 
Solution. 
Height | Rankin | Weight | Rank in X-y (x-y)? 
Height Weight 
x y 
165 6 61 5 1 1 
167 4 60 6 -2 4 
166 5 63.5 2 3 9 
170 2 63 -1 1 
169 3 61.5 4 -1 1 
172 1 64 it 0 0 
Total - - - : 16 
LE = 1 - U5 = 1 - 0457 
= 0.543 
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Problem 2. From the following data of marks obtained by 10 students in 
Physics and Chemistry calculate the rank correlation coefficient 


Physics(P) 35 |56 | 50 |65 | 44 |38 | 44 |50 | 15 | 26 


Chemistry(Q) | 50 |35 |70 |25 | 35 |58 | 75 | 60 | 55 | 35 


Solution. We rank the marks of Physics and Chemistry and we have the 
following table 


P Rankin | Q | Rankin Q X-y (x y»? 

d y 

x 
35 8 50 6 2 4 
56 2 35 8 -6 36 
50 3.5 70 2 1.5 2.25 
65 1 25 10 -9 81 
44 5.5 35 8 -2.5 6.25 
38 7 58 4 3 9 
44 5.5 75 1 4.5 20.25 
50 3.5 60 3 0.5 0.25 
15 10 55 5 5 25 
26 9 35 8 1 1 

Total - - - - 185 


We observe that in the values of x the marks 50 and 44 occurs twice. 
In the values of y the mark 35 occurs thrice. Hence in the calculation of 
the rank correlation coefficient Y (x — y)? is to be corrected by adding 


2. 2.. 2. 
the following correction factors | 2 t — 2| t <= 5 23, 


~. After correction Y (x — y)? = 188 


-4:023073)5 _ 4 _ 6X188 . 4 _ 1128 
Now, p=1- n(n2-1) 10x99 1 990 
= 1-1.139=-0.139 
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Problem 3. Three judges assign the ranks to 8 entries in a beauty contest. GOTTPOROR 


Judge |1 2 4 3 7 6 5 8 
in X 
Judge |3 2 1 5 4 7 6 8 
in Y 
NOTES 
Judge |1 2 3 4 5 7 8 6 
in Z 
which pair of judges has the nearest approach to common taste in beauty? 
Solution. 
x |Y |z |xy |(x-y? |y-z) (y—z)? |zx | (z-xy 
1 3 1 -2 4 2 4 0 0 
2 2 2 0 0 0 0 0 0 
4 1 3 3 9 -2 4 -1 1 
3 5 4 -2 4 1 1 1 1 
T 4 5 3 9 -1 1 -2 4 
6 f 7 -1 1 0 0 1 1 
5 6 8 -1 1 -2 4 3 9 
8 8 6 0 0 2 4 -2 4 
Total - 28 - 18 - 20 
_ 1 62(x-y)? 
Pxy © 21) 
=1-- 1-18 =1-0.333=0.667 


“g(82-1) 504 


E c BUM re esa 
Pye" “gees 804 - MOT 


6x20 120 
Pax = 1- Sal = 1 — 0.238 = 0.762 


Since p is greater than Pxy and p,, the judges Mr.Y and Mr.Z have 


nearest approach to common taste in beauty. 
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Problem 4. The coefficient of rank correlation of marks obtained by 10 
students in Mathematics and Physics was found to be 0.8. It was later 
discovered that the difference in ranks in two subjects obtained by one of 
the students was wrongly taken as 5 instead of 8. Find the correct 
coefficient of rank correlation. 


. ap SECS) 
Solution. p.,-1— (n2) 
Given pue 0.8 and n = 10 


“0.8 = j.$267»»*. , exec 
10(102-1) 990 


zm 2 
{HOEY 14 708 202. 


990 


6X(x— y)? = 990 x 0.2 = 198 
Èa- y)? = 33 


Corrected X(x — y)? = 33 — 5? + 8? = 72 


6X72 


Now, after correction pa lc EN 


=1-4 = 1 — 0.436 
990 


= 0.564 
The correct coefficient of rank correlation is 0.564 

Problem 5. Let x4, x;, ..., x, be the ranks of n indi viduals according to a 
character A and y4, Y2, ..., Yn the ranks of the same individualsaccording 
to another character B. It is given that x; + y; = 1 +n for i=1, 2, 3,....., n 
Show that the value of the rank correlation coefficient p between the 
characters A and B is -1 

Solution. Given x; +y; =14+n  ......... (1) 

Let the difference of ranks be dj 


“Xi FYE = d; 3e este Hoa Ene eda (2) 


Adding (1) and (2) we get 2x; = 1 +n + di 
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Correlation 
^. dj = 2Xi — (n + 1) 
Now, X; d£2Y[2x; — (n+ 1)]? 
£ 2 M 
= XYX[4xf + (n + 1)^ — 4(n + 1)xi] NOTES 


=4y x? +n(n4+ 1)? -4M4+DYx; 


[eres] + n(n + 1)2 —4 pe] 


5 n(n+1) | (2n+1)+(n+1)—-2(n+ 1) 


mme 
3 


=n(n+1)| 


= n(n+1)= (n—1) 


E =n(n? — 1) 
= ex d z e[na?-1)] x 
Now p=1 TOES acf 1-2 
=] 
Exercises. 


1. Calculate the rank correlation coefficient for the following 


data. 
(i) 
X 5 2 8 1 4 6 3 7 
Y 4 5 7 3 2 8 1 6 
(ii) 
X 10 12 18 18 15 40 
Y 12 18 25 25 50 25 


following order. 


2. Two judges in a beauty contest rank the ten competitors in the 


6 


4 


3 


1 


2 


7 


9 


8 


10 


4 


1 


6 


7 


5 


8 


10 


9 


3 


Do the judges appear to agree in their standard? 
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3.Ten students got the following percentage of mark in two 


subjects. 
Economics 78 |65 | 36 |98 |25 |75 | 82 |92 |62 | 39 
Statistics 84 153 |51 |91 | 60 | 68 | 62 | 86 | 58 | 47 


Calculate the rank correlation coefficient. 
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UNIT-VIII REGRESSION 


8.1 INTRODUCTION 


There are two main problems involved in the relationship 
between x and y. The first is to find a measure of the degree of 
association or correlation between the values of x and those of y. The 
second problem is to find the most suitable form of equation for 
determining the probable value of one variable corresponding to a given 
value of the order. This is the problem of regression. 

If there is a functional relationship between the two variables 
x; and y; the points in the scatter diagram will cluster around some curve 
called the curve of regression. If the curve is straight line it is called a 
line of regression between the two variables. 


8.2 Definition. It we fit a straight line by the principle of least square to 
the points of the scatter diagram in such a way that the sum of the squares 
of the distance parallel to the y-axis from the points to the line is 
minimized we obtain a line of best fit for the data and it is called the 
regression line of y on x. 
Similarly we can define the regression line of x on y. 

Theorem 8.1 The equation of the regression line of y on x is given by 

= o A 
X Sie m) 


Proof. Let y=ax+b be the line of regression of y on x. 


According to the principle of least squares constants a and b are 
to be determined in such a way that S=} [y; — (ax; + b)]? is minimum. 


ðs 
au 0 => -25 Gi — aXij — b)x; =0 
=> Xxy;-aÀxi-bXx; —— 0000 2222222 (1) 


Os 
Z= 0 => -2X (yi - ax; —b) =0 


=> } y; =a} x;}nb — — — —— ixbseboSR (2) 


Equations (1) and (2) are called normal equations. 


From (2) we obtain y 2 ax +b >  ........ (3) 
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*. The line of regression passes through the point (X, y). Regression 
Now Shifting the origin to this point (x,y)by means of the 
transformation X; = x; — X and Y; = y; — y we obtain Y x; = 0 = } yi 
And the equation of the line of regression becomes Y=aX ........... (4) 


Corresponding to this line Y-aX, the constant a can be 
determined from the normal equation. aY x? = Y; X; Y, NOTES 


. J-2XuYi | Qi- — Yoxoy _ | Sy 
CO aXW X6u-3* ox? Ox 


<. The required regression line (4) becomes Y=(y A) X 


«y-$-2(Y3)-5 


Theorem 8.2 The equation of regression line of x on y is given by 


x=(y-)(y-y) 
X—X-— us y-y 
Proof. Proof is similar to that of theorem 8.1 


Note. (X, y)is the point of intersection of the two regression lines. 


Definition. The slope of the regression line of y on x is called the 
regression coefficient of y on x and it is denoted by by, Hence 
lo mE 
yx ~~ Tec 
The regression coefficient of x on y is given by by, = Y- 
y 


We now give some properties of the regression coefficients. 


Theorem 8.3. Correlation coefficient is the geometric mean between 
the regression coefficients. (i.e) y=+,/ Dyybyx 

=n OY => 4 {Ox 
Proof. We have by, = i and b,, = Ve 


5 Bsgbuy = y^ 


^n Y=+y Dyybyx 


Note. The sign of the correlation coefficient is the same as that of 
regression coefficients. 
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Theorem 8.4. If one of the regression coefficient is greater than unity 
the other is less than unity. 


Proof. We have b,yby, = y? < 1sothat byyby, < 1 
Hence by, > 1 => by, «1 


Hence the theorem. 


Theorem 8.5. Arithmetic mean of the regression coefficients is greater 
than or equal to the correlation coefficient. 


Proof. Let byy and by, be the regression coefficients. 
We have to prove that . (bxy + byx) 2y 
1 
Now, z bx t byx) > y > byx + by 2 2y 
eyttyS>2 
ho tu aa Y 


o [9] 
o.:-2 
Ox Oy 


e o, + 0y?-26,0y 2-0 
e(oy — oy)? = 0 
This is always true. Hence the theorem. 


Theorem 8.6 Regression coefficients are independent of the change of 
origin but dependent on change of scale. 
Proof. Let u; = LA and vj = Hes 
k k 
Let x; = A+ hu; and y; = B+ kv; 


We know that o, = hoy; Oy = ko, and p m 


[o] koy\ k 
Now, by, — la = = Vay Ge) Bee A (1) 
Similarly byy = (5) le 0007 o INGNENSRASS Q) 


From (1) and (2) we observe that by, and bxy depend upon the scales h 
and k but not on the origin A and B. 


Hence the theorem. 


TI 


Regression 


NOTES 


self - Instructional Material 


Regression 


Theorem 8.7 The angle between the two regression lines is given by 


0-tan^! (=) G) ' 


Proof. The equations of lines of regression of y on x and x on y NOTES 
A = Oy = 
respectively are y-y= (v x) (X—X) o upana (1) 
x-x-(r8)ty-» EE: (2) 
" =~ 1 oy 
(2) can also be written as y — y = ( x) [X X), etas tH s (3) 


-. Slopes of the two lines (1) and (2) arey 2t and 2X 


Y Ox 


Let 0 be the acute angle between the two lines or regression. 


-1/ oxtoy 
= y ox? toy? 


f= + . š 
= e( x 2) (since 7 < 1 and ĝis acute). 
y NOx^ toy 


-0 = tan 1 (EÊ) ( 9x9 
nm [e C) 
Notel. The obtuse angle between the regression lines is given by 
-1| (771) (..9x9v 
s K y ) (2) 


Note2. If y=0 then tan0—oc. Hence 0—7/2. Thus if the two variables are 
uncorrelated then the lines of regression are perpendicular to each other. 


Note 3. If y=+1 then tanO - 0 

Hence 0-0 or x 

<. The two lines of regression are parallel . 

Further the two lines have the common point (X, y) and hence 
they must be coincident. 

Therefore if there is a perfect correlation (positive or negative 
between the two variables then the two lines of regression coincide. 
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Solved Problems. 


Problem 1. The following data relate to the marks of 10 students in the 
internal test and the university examination for the maximum of 50 in 


each. 
Internal marks | 25 |28 | 30132 |35 | 36 | 38 | 39 | 42 | 45 
University 20 | 26 |29|130 | 25 | 18 | 26 | 35 | 35 | 46 
marks 


(1) Obtain the two regression equations and determine 


(i1) the most likely internal mark for the university mark of 25 


(iii) the most likely university mark for the internal mark of 30 


Solution. (i) Let the marks of internal test and university examination 
be denoted by x and y respectively. 


we have x = —Y x = 35 and y = Xy; = 29 


For the calculation of regression we have the following table. 


Regression 


NOTES 


x | 4-35 |(x;-35)?| y; |y; -29 | (y; - 29) | (x; 35) (y; - 29) 
25 -10 100 20 -9 81 90 
28 -7 49 26 -3 9 21 
30 -5 25 29 0 0 0 
32 -3 9 30 1 1 -3 
35 0 0 25 -4 16 0 
36 1 1 18 | -11 121 -11 
38 3 9 26 -3 9 -9 
39 4 16 35 6 36 24 
42 7 49 35 6 36 42 
45 10 100 46 17 289 170 
To 0 358 - 0 598 324 
tal 

B. = Ene = DYE — 35)? = 35.8 
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Regression 


_ Xgiy? _ 1 = 
B. = a = TASE = 29)? — 59.8 


Ox = 5.98 and oy = 7.73 


_ ECuc3u-y 324 


noxoy 10x5.98x7.73 NOTES 


32 
~ 462. 254 


— 0.7 (approximately) 
Now, the regression of y on x is (y-y) — y= (x — x) 


yX — Eou-i-Y) | X6u—Gi-Y) 


7 22 
Ox nox XGu-X) 


— 3%4 _ 0.905 


358 


— Qi- DOi- y) 324 
XGi-y» ERU ae 


Similarly em 
The regression line of y on x is y-29=0.905(x-35) 
(Le)yz0.905x-2.675 |. | ^  .— (1) 
The regression line of x on y x-35=0.542(y-29) 
(i.e) x=0.542y4+19.282 — . eeeeeeeee (2) 


(1) and (2) are the required regression equations. 


(ii) The most likely internal mark for the university mark of 25 is got 
from the regression equation of x on y by putting y=25 


'. X=0.542 x25419.282232.83 


(iii) The most likely internal mark for the university mark of 30 is got 
from the regression equation of y on x by putting x=30 


". y=0.905x 30 -2.675 224.475 


Problem 2. For the solved problem 1 of 6.1 estimate the university 
examination mark of a student who got 61 in the college internal test. 


Solution. We have to find the equation of regression line of y on x and 
then estimate the value of y for the given value of x=61 


The regression line of y on x is given by (y-y) — Y "i^ — X) 
z =A+hū = 634 (77) - 56 


y -B-hy- 60 + (-7) = 58.8 
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Dru. ear um _ 1980 _ (y id 7 Regression 
ou [n n ~ [10 10 


j 2311/2 i 271/2 
a. = Ei- 831^ - E- ay] ^ = oes 
since the scale factor for u; and vjis one we note o, = o, and oy = o, NOTES 
we have y=0.6 for the solved problem 1 of 6.1 
-. Regression equation of y on x is y-58.8-0,6(* 72) (x — 56) 
^ 7yz5.9388x + 79.0272 
when x=62, 7y=5.9388x60+79.0272=44 1.294 
^. y=63 (approximately) 


.. When the internal test mark is 61 the university examination 
mark is estimated to be 63 


Problem 3. Out of the two lines of regression given by x+2y-5=0 and 
2x+3y-8=0 which one is the regression line of x on y.? 


Solution. Suppose x+2y-5=0 is the equation of the regression line of x 
on y and 2x+3y-8=0 is the equation of the regression line of y on x. 


l : 2 8 
Then the two equation can be written as x=-2y+5 and y= EE. 
; € 2 
Hence the two regression coefficient by,=— A and byy=—2 
4 "WE ; 
Now y? — byxbyy = 5> 1. This is impossible. 
Hence our assumption is wrong. 


^. 2x+3y-8=0 is the equation of the regression line of x on y. 


Problem 4. The two variables x and y have the regression lines 3x+2y- 
26-0. and 6x+y-31=0 


Find (1) the mean values of x and y 
(ii) the correlation coefficient between x and y 


(iii) the variance of y if the variance of x is 25 
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Solution. (i) Since the two lines of regression pass through (x, y) we 


have Regression 
3X27 226. nanenane (1) 
OXtysOl —  Guseesesus (2) 

Solving (1) and (2) we getx = 4andy = 7 NOTES 


(11) As in the previous problem we can prove that y= =x + 13 


1 31 E rel 
and x=— zy + = represent the regression lines of y on x and x 
on y respectively . 


: = 3 1 
Hence we get the regression coefficient as by, = — » and bxy = —- 
3 1 1 
Now, y? = (-3) x (-i) =- 
2 6/ 4 
. 1 
n Y= 5 


: i - : i 
Since both the regression coefficients are negative we take y= — " 


ii) Given o% = 5 
We have by, = Y 
ind esaet 
i7 CIG) 
“Oy — 15 
Problem 5. If x= 4y+5 and y = kx+4 are the regression lines of x on y 


and y on x respectively (i) show that 0 < kx1/4 


(ii)If k=1/8 find the means of the two variables x and y and the 
correlation coefficient between them. 


Solution. (i) The regression line of x on y is x= 4y+5 
Hence by, = 4 
The regression line of y on x is y =kx+4. Hence by, = k 
Now, by, by, = Y^ > 4k 2 y* 
Now, 0< 3? < 1 — 0x 4k xl 
=> 0< k< 1/4 
(ii)Given k=1/8. Hence by, = 1/8 
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Regression 


1 "T . : t 
Hence = (Positive value of y is taken since both regression 


coefficient are positive) 
Let x and y be the means of the two variables x and y . 


Since the regression lines pass through (x, y) we have x= 4y+5 and NOTES 
yank + 4 | Taking k = : 
y= a8 ( aking K = 5) 


Solving for xandy we get X = 42 and y = 9.25 


Problem 6. The variables x and y are connected by the equation 
ax+by+c=0. Show that ry, = —1 or 1 according as a and b are of the 
same sign or of opposite sign. 


b 
Solution. Writing ax + by + c=0 in the form y= — am - we get the 


$ ae : b 
regression coefficient of x on y is by, = — = 


b 
Now Y? = byyb,y Y^ = — (5) (5) 
Suppose a and b are of same sign. Then y? = 1 
Hence y-1 (since by, and b, are positive) 


Problem 7. If 0 is the acute angle between the two regression lines show 


that 0 < 1- Y?. 

Solution. We know that if 0 is the acute angle between the two 
: à E dep OxOy 

regression lines we have tanO =( 2 ) (es) E (1) 


We claim that 0x? + oy? 2 26,6, 
Suppose not, then o,? + oy? < 20,0y 
(ie) ox? + oy? — 26,0, < 0 
À 2 peas : 
(ie) (ox = oy) « 0 This is impossible. 
Hence ox? + oy” 2 20%0Oy. 


< "From (1) we get tan 0< (=) (5) 


Avr 2 


OxOy 


i ox? toy 


".tan e« (7). Hence sino (77) 


.sin0x1—7 


83 self - Instructional Material 


Exercises. 


1. Calculate the coefficient of correlation and obtain the lines 
of regression for the following data. 


1 2 3 4 5 6 7 8 9 


9 8 10 12 11 13 14 16 15 


2. The following table shows the ages x and blood pressure 
y of 12 women. (i) Find the correlation coefficient between x and y. 
(ii)Determine the regression equation of y on x. 
(ii)Estimate the blood pressure of a women whose age is 45 
years. 


Age(x) | Blood Pressure(y) | Age (y) Blood Pressure(y) 
56 147 55 150 
42 125 49 145 
72 160 38 115 
36 118 42 140 
63 149 68 152 
47 128 60 155 


3.Calculate the coefficient of correlation for the following data. 


X 3 6 5 4 4 6 7 5 


y 3 2 3 5 3 6 6 4 
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UNIT-IX CORRELATION Correlation Coefficient for a 
Bivariate frequency 


COEFFICIENT FOR A BIVARIATE Distribusion 
FREQUENCY DISTRIBUTION 


9.1 INTRODUTION : 


As in the case of a single variable large collection of data NOTES 
corresponding to two variables under consideration can also be 
conveniently summarised in a 2-way frequency table (bivariate frequency 
table)which is illustrated below. 


If there are n classes for the first variable x and m classes for the 
second variable y then there are mn cells in the 2 - way table. If x; and yi 
denote the mid values of the i" class for x and j” class for y respectively 
then the frequency fij corressponding to (xi yi) is entered in the (ij) cell. 
For i= 1,2,......n and j = 1,2,....,m we get all the mn cells, in the table. 


From the bivariate frequency table we note the following : 
(1) For any fixed 1 we have > " = g; = the sum of all the cell 


j=l 
s -th 
frequencies of the i" column. 


(2) For any fixed j we have ` fj; = $j = the sum of the cell frequencies 
i-l 


of the j” row. 
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Mid points of x Total 
frequencies 
X X1 XQ [eee Xi /— eee Xn of y 
y 
yı TI fo) Ou dus fi Lo [wewR fh 1 fi 
y2 fi foo ETE fo wise ieiuna n fo £ 
Yi | |o 5 cece eee fi E PE faj f; 
> 
m 
e 
= 
' 
&. 
3 
= 
Ym fim fom VECES fim EE fan fm 
Total gı By- — sce: B rarer gy IN 
frequencies 
of x ~~ g= 
i-l 
2 
jel 
(3) If the total frequency of all the mn cells is N then 
N= YPg- 2f mdN-» F2 fi 
i=l j=l id j= id j=l 
4)x2x > fi X cr «(Su Ji 27 Xi gi 


Correlation Coefficient for a 
Bivariate frequency 
Distribution 


NOTES 
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qo Correlation Coefficient for a 
N 2s gixXi  from(l) Bivariate frequency 
i=l Distribution 


(5) Similarly y == Y 
j=l 
Osi 2, D fjxi-X'-x Y. e-r NOTES 


(7) Similarly o2 5° f, y? - y? 


Jal 


n 


(cov) e$ ÈE Df, xy 


i=1 j=l 


The correlation coefficient between x and y is given by yxy = aatan 


OxOy 


» 4 woe a( Ses Jio 


Note. Since correlation coefficient is independent of origin and scale if 
-A -B 
x and y are transformed to u and v by the formula u= and v= m 


then we have yxy =Yuv 


Solved problems 


Problem 1. Find the Correlation Coefficient between x and y from the 


following table. 
x 5 10 15 20 
y 
4 2 4 5 4 
6 5 3 6 2 
8 3 8 2 3 
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Solution : Correlation Coefficient for a 
Bivariate frequency 
Distribution 


X| X X3 X3 X4 Total 
y 5 10 15 20 
4 2 4 5 4 f1-—15 
A : NOTES 
y2 6 5 3 6 2 f; = 16 
Y3 8 3 8 2 3 f; = 16 
Total 17-10 | g = 15 g; =13 | g4,-9 | N=47 


Correlation coefficient between x and y is given by 
» È È fij xiyj- O gixi) © fiyi) 

È Bix} — x Œ gix)? [E fiy? — x © fiy)? 
where i = 1,2,3,4 and j = 1,2,3 


Yxy 


Y gi Xi=50 + 150 + 195 + 180 = 575 

Xf; yi= 60 + 96 + 128 = 284 

Y gix? = 250 + 1500 + 2925 + 3600 = 8275 

X y? = 240 + 576 + 1024 =1840 

X fy xiyi = (40 + 160 + 300 + 320 ) + (150 + 180 + 540 + 240) 
+ (120 + 640 + 240 + 480) = 3410 


3410- = (575 x284) 


Yxy = | 1 1 
8275 — — (575)? [1840  — (284)? 
47 47 


3410 x47- (575 x284) 


[eos xc Ban [neo x - 2842 
47 47 


160270—163300 
= 4388925 —330625 V86480—80656 


—3030 —3030 —3030 
= 458300  V5824 241.5 x76.3 ^ 18426.5 


= -0.16 
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Problem 2. Find the correlation coefficient between heights and 


Height Weight in kgs Total 
in c.m. 
30-40 | 40-50 | 50-60 60-70 70-80 

150-155 1 3 7 5 2 18 
155-160 2 4 10 7 4 27 
160-165 1 5 12 10 7 35 
165-170 - 3 8 6 3 20 

Total 4 15 37 28 16 100 


weights of 100 students which are distributed as follows. 


Solution : Let x; denote the mid value of the classes of weights and yi 
denote the mid value of the classes of height. 
-55 yj- 157.5 


" 
Let uj = and vj; = 
10 5 


Then the 2 way frequency table is given below. 
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Correlation Coefficient for a 
Bivariate frequency 


35 45 55 65 75 fi vi | fivj fiv? fui Distribution 
Vi 
DIAT A 
152.5 18 | -11| -18 | 18 | C9 
1 3 7 5 2 
010|01]d0/|0 NOTES 
157.5 
27 |0| 0 | 0 | (0) 
2 4 10 7 4 
C) | 5) | ©) |ao[a 
162.5 35 1 35 35 | (17 
1 5 12 10 7 ) 
C6 | () | 12 | 2 
167.5 x 20 2 40 80 | (18 
3 8 6 3 ) 
4 15 37 28 16 100 = 57 133 | (31 
) 
-1 0 1 2 - 
-2 
gi Ui -8 -15 0 28 32 37 
giui 16 15 0 18 64 123 
22 fi uj Vj 
fiiv; | (0) | C9 | © | 12 | Q2 | BD 
Xfjuivj-i Qiu) (DF vj) 
Yxy — Yuv 


1 


1 


0 


0 


(37 x57) 


3100-37 x57 


— d (L37)2 —.1 (5732 
jo is C3) |133 -67 


991 


ip giuv-< Casu JXfvi-z(Xfv) 


~ /412300- 372 413300— 572 410931 10051 


991 


991 


^ 104.5 x100.25 410476 ` 
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Exercises 


1. Calculate the coefficient of correlation between the ages of husbands 


and wife from the following data. 


Ages of Age of husbands in years Total 
wives in 

years 20 - 25 25 - 30 30 - 35 

16 - 20 9 14 - 23 
20 - 24 6 11 3 20 
24 - 28 - - 7 7 

Total 15 25 10 50 


2. Calculate the coefficient of correlation between the ages hundred 


mothers and daughters from the following data. 


Ages of Ages of daughters in years Total 

mothers 

in years | 5-10 | 10-15 | 15-20 | 20-25 | 25-30 
15-25 3 - : x 9 
25-35 16 10 - - 29 
35-45 10 15 7 - 32 
45-55 - 7 10 4 21 
55-65 : z 4 5 9 
Total 29 32 21 9 100 
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UNIT-X INTERPOLATION 


10.1 INTRODUCTION 


Definition. Interpolation is the process of finding the most appropriate 
estimate for missing data. It is the “art of reading between the lines of a 
table". For making the most probable estimate it requires the following 
assumptions. 


(The frequency distribution is normal and not marked by sudden ups 
and downs. 


(ii) The changes in the series are uniform within a period. 


Interpolation technique is used in various disciplines like 
economics, business, population studies, price determination etc, . It is 
used to fill in the gaps in the statistical data for the sake of continuity of 
information. For example, If we know the records for the past five years 
except the third year which is not available due to unforeseen conditions 
the interpolation technique helps to estimate the record for that year too 
under the assumption that the changes in the records over these five years 
have been uniform. 


It is also possible that we may require information for future 
inwhich case the process of estimating the most appropriate value is 
known as extrapolation. There are two methods in interpolation. 


() Graphic method 
(ii) Algebraic method 


()Graphic method is a simple method in which we just plot the 
available data on a graph sheet and read off the value for the missing 
period from the graph itself. 


(ii) Algebraic method. There are several methods used for interpolation 
of which we deal with the following: (i) Finite differences. 
(i1) Gregory — Newton's formula 


(iii)Lagrange's formula 
10.2 FINITE DIFFERENCES. 


The operator A. U, is a function of the independent variable x and if a, 
ath, at2h......... are a finite set of equidistant values then 
Uo cns oues are the corresponding values for U, . The values of 
the independent variable x are called arguments, the corresponding 
values of U, are called entries and his known as the interval of 
differencing. Hence U,,j is the entry for the argument ath. 
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Definition. We define an operator A which is known as the first order Jitérpoldiion 
difference on U, as AU, = Ux+h — Ux where x= a, ath, a+2h,........ In 
particular (i) AU, = U444 — Ua (ii) AU, = 0. If U, is constant . 


Higher order differences can similarly be defined. 
Example. A^U, = A(AU,) = A(Uy+n — Ux) NOTES 
(Uy42n TT Uxin) A (Ux.n mi U,) Peces (1) 


Uxi2n ES 2Ux4ntUx 
Also from (1) A?U, = Ux+h — AU, 
Notel. Unless otherwise stated interval of differencing is taken as 1. 


Note 2. It is very easy to verify that the operator A satisfies the basic 
laws of algebra. 


(i)Ais linear (ie) A(aU,,+bV,,) = aAU, + bAV, 
(ii)A satisfies the law of indices for multiplication. 
(ig) A A USA Ue 


We can construct the difference table for any number arguments 
and a sample difference table is exhibited below for five consecutive 


arguments. 
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Argument | Entries 1* diff 2^ diff | 3" diff 4th diff Interpolation 
x U, AU, AX, dns jus 
A U, 
NOTES 
ath AU, 
Ua+h AUC 
a+2h AUath 
Ua+2h A*U, 
A°Ua+h 
m AU, «n A*U, 
Ua+3h AU sc 
A^Usi2n 
is AU, «an 
Ua+4h 


In this table U, is known as the first entry and AU,, MU AU A US 


are called leading differences. 


Example. The difference table for the following data is given below 


x JO 1 2 3 4 


U, 8 11 9 15 6 
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X Ux AU, MU, |AXU, A*U, 
0 8 
3 
1 11 -5 
29 13 
2 9 8 -36 
6 -23 
3 15 -15 
-9 
4 6 


The differences AU,, A?U, etc are called forward differences. Since 
they involve functional values which are to the right of U,. In cintrast to 
the forward differences we havw another kind of differences known as 
backward differences which involve functional values to the left of 
U,. For this purpose we define another operator V(nebla) as 


VU,-—U,-—U,, where x= a, a+h, a+2h,....., at+nh. 
This is called the first order backward difference of Ux 


We note that VU,4, = AU,. Thus there is a mutual relation 
between A and V. 


The backward difference table for 5 consecutive arguments is 
given below. 
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Ux VU, VU, | V3U. VU, 
Ua 
VUa+h 
a+h Vath V7Uas2h 
VUa+2h 
a+2h Uai2n V Usah 
V°Ua+3h 
Vitus V^U, Lan 
a+3h Ua+3h y? Urik 
V^U, an 
VUa+4h 
a+4h Uasah 


Higher order backward differences can similarly be defined as 


VU, = yn = Ve, 


Further it can easily be verified that V"Uainn = V"U4 


Hence the same forward difference table constructed for U, can 
as well be employed to find out the backward differences of U, and its 


higher order differences. 


For example, from the above example, V*U, 


—36 = V*U,etc. 
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The operator E. 
Definition. The operator E on U, is defined as EU, = Ux+h 
The higher order operator of E can similarly be defined. 
Generally — E"U, = Uxann 
If h=1, then EPU, = Uxann 
For example, E°Uy = Ug,4 = Us 
E?U, = U4,3 = U; 
Ec eus ems 
Theorem 10.1 (E-1-A (i) E=(1 — V)! 
Proof. Let U, be a function of x 
(i) A Ux ZU, 4 — Ux (by definition) 
= EU; — Ux 
= (E-1) Ux 
“A= E-1. Hence E=1+A 
(iii) VU, = Ux — Uy. 
Sl cg = Ux —VUx 
= (1-V)Ux 
E 1U, = (1-V)U, (since EU, 4 = U,2U, & = E !U,) 


-. E^! =1-V. Hence (1 — V)! 


Note 1. It is very easy to verify that the operator E also satisfies the basic 


laws of algebra such as linearity and law of indices for multiplication. 


Lemma. The two operators A and E are commutative under composition 


of operations. (ie) A o E-E, A 

Proof. (A. E Ux)=A(E(Ux)) = ACU x+n)= Ux+2h — Ux+h 
= EUx4n — EU, = E(Ux+n — Ux) 
= E(A(Ux))=(E 0 A)( Ux) 


Hence A, ESE. A 


97 


Interpolation 


NOTES 


self - Instructional Material 


Theorem 10.2 (Fundamental theorem for the finite differences) 


constant ifr =n 


If U, is a polynomial of degree n then A'U, = { 0 cece 


(ie) the n"' order difference of a polynomial of degree n is constant and 
differences of order higher than n are zero. 


Proof. Let U, = agx" +.ayx™ 1 +- + ap-1X +a, 
Where ag, a4, a2, ..., an are constants and a, = 0 
AU, = Ux+n — Ux 
-[ag (x +h)”  a4(x +h)" 14+-+a,-14(K+h) + ay] 
-[agx? + a4x"^1 +- ay 4x c ag] 


= [ao (x^ n, x” th + + + h”) + a4 (x71 + +n — 1,,.x" 7h + 
..thn—-1+...4¢an—a0xn+alxn—1+...4 an—1x +an 


= agnhx? ^! + byx™~*+4...4 b,_1X + bp where 
bz, b3, ..., b, are constants independent of x and aygnh #0 


^. AU, is a polynomial of degree n-1 
Continuing this process we get A*U, is a polynomial of degree n-2 
A?U, is polynomial of degree n-3 


A"U, =agn(n-1)(n-2)...2.1h"x® = agn! h” = constant....(1) 


And A‘U,=0 forr>n 


Note. In Particular if the interval of differencing is unity and Ux = aox”, 
Then A" (agx?) = agn(n — 1)....2.1(using 1) 
sag (0) = aon! 


=A? (x?) n! 
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Solved Problems. Interpolation 


Problem 1. Find the second order differences for (i) Uy = ab™ 


(ii) U, = XU taking interval of differencing as 1. 


Solution. (i) AU, = Uy.) — U,2 abc &*) — apcxz apcXpch — ab™ 
( ) x x+h X NOTES 
= ab“ (b“" — 1) 


A^U,- (b*^ — 1)A( ab**) = (b® — 1)?ab™ 


fu X 4 3 : : 
(ii) U; = nee AMD ae (by partial fraction) 
4 3 4 3 
nau s a e = Er ^ x*3 

pS I ig iy RE UR 

^ x45 Xt4 x+4 = x+3 

epee S Gee 

^ x5 X4 x43 

ET 4 11 10 3 : 
Similarly A^U, = — — a (verify) 


x+6 xt+5 xt+4 x+3 


Problem 2. Find A"sinx taking h = 1 


Solution. Asinx=sin(x+1)-sinx=2 cos(x + 5)sin(=) 
= 2 sin(2) sin(x +++) 
NowA?sinx = A |2 sin (5) sin(x - 5 2)] 
-2 sin(2) [sim (x-- 1 1 1) -sin (x +++ 2] 
-2 sin(2) [2 cos (x + 1 2) sin (2)] 
= [2 sin (sim +2(2+4)) 


Proceeding like this we get A" sinx = [2 sin oF sin (x +n E + 3) 


2 2 
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Problem3. Prove that A (logU,) =log (1 + T) Interpolation 


Solution. A (logU,) = logUx+n —logU, = log (SE - log( ==) 


- log =H] = log ==] 


Ux 
= log(1 + T>) 


NOTES 


3 


Problem 4. Evaluate. 


2 
—— taking h=1 


x2 


Solution. Ax? =(x + 1)? — x? = 3x? + 3x + 1 


A*x3 = A(Ax?) = A(3x? + 3x + 1) 
= 3Ax? + 3Ax + A(1) 
= 3[(x + 1)? —x*] + 3[(x +1) - x] +0 
= 6(x+1) 
Now Ex? = (x + 1)? 


Ax _ 6(x+1) __ 6 


UExX?  (x41)2 — x41 
Problem 5. Evaluate A? [(1 — ax)(1 — bx)(1 — cx)] 
Solution. Let U, = (1 — ax)(1 — bx)(1 — cx) 


The polynomial whose third order difference is to be calculated is a third 
degree polynomial and the coefficient of x? term is — abc 


: r (0 ifr» 3 
Since, A U, = ae ifr = 3 we have 


ASU,2A? (—-abc x?) = —abc A? (x?) = —abc 3! 
= -6 abc 


Problem 6. If Uo = 1, Ui = 5, U» = 8, U3 = 3, U, = 7, Us = 0 find 
AU, 


Solution. Consider A°U) = (E — 1)U, 
= (E5 — 5E* + 10E? — 10E? + 5E — 1)U, 


= E?U, — 5E*U, + 10E?U, — 10E?U, + SEU, — 1 
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= Us d 5U, + 10U3 a 10U, + 5U, v Uo 


= 0-(5x7)-(10x3)-(10x8)-(5x5)-1 


= -35+30-80+25-1 


= -6l 
Alter. This can be obtained by forming the difference tale as shown 
below. 
X| U, | AU, A?U, AU, AU, | APU, 
0 1 
4 
1 5 -1 
-7 
3 24 
2 8 -8 -61 
17 
-5 -37 
3 3 9 
-20 
4 
4 7 -11 
-7 
5 0 


Hence A°U, = —61 


Problem 7. Estimate the missing term in the following table. 


X 


0 


1 


2 


3 


4 


Ux 


1 


3 


9 


81 


Explain why the resulting value differs from 3? 
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Solution. Let the missing term in Ux bea 


The difference table is given below 


x| Ux, | AU, AU; NU, A*U, 
0 1 
2 
1 3 4 
a-19 
6 
2 9 a-15 124-4a 
a-9 105-3a 
3 a 90-2a 
81-a 
4 81 


Since 4 values of U, are given it is a polynomial of degree 3. 
Hence by fundamental theorem of finite differences A*U, = 0 for all x. 


In particular AU = 0. Hence 124-4a=0 
'. az3l 
Exercises. 


1.Find the first order differences for the following functions 
taking the interval of differencing as 1. 


Mg Nr. Vous —4 : X XFl 
e (ii) 2% (i)tan^ x  (iv)x(x-D3* (v) eD 
i x—4 

VD a; 
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2. Prove that A tan cx = ————————;c being constant. 
coscx cos [c(x+1)] 


3. Prove that (i) A(U,. Vx) = UAV, + Vn AU, 


is U Vx AU -UxAV. 
(ii) a(>) = X X X X 
Vx VxVx+h 


4J1f U,=2x? — x? + 3x +1 find A°U, taking the interval of 
differencing as unity. 


5. Show that A®U,=6ah? where If U,= ax? + bx? + cx + d and 
h is the interval of differencing. 


6. Evaluate (i) A’x? taking h as interval of differencing. 
(ii) A*[(1 — 2x) (1 — 3x)(1 — 5x) (1 — 6x) ] taking h=1. 
Gii) A"[(1 — ax)(1 — bx?) (1 — cx?)(1— dx*)] taking 


h=1. 
7.Form the difference table for the following data 
®© 
X 1 2 3 4 5 
Ux 2 7 8 11 8 
10.3 NEWTON,S FORMULA 


Consider the function y =f(x). Let f(xo) = yo, f(X1) = y1, 


f(x) = ys. The replacement of f(x) by a simple function (x) which also 
assumes the values yo,y;, ...., yn at the points Xo, X, ..., Xyis the basic 
principle of interpolation. $(x) is called a formula for interpolation and 
we say that (x) represents f(x). If $(x) is a polynomial of degree n then 
o(x) is called an interpolating polynomial. The existence of an 
interpolating polynomial is supposed by Weierstrass; approximation 
theorem which asserts that every continuous function on a closed interval 
can be approximated by a polynomial. 


We now describe Newton -Gregory formula for 
constructing a n interpolating polynomial. 


Theorem 10.3 (Newton's —Gregory Interpolating formula for equal 
Intervals) 


Let U4, Uasn, ...., Ua+np be the values of the function U, at the points a, 
ath, a+2h, .....,a*nh, which are of equal interval of difference. 
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Interpolation 


2 
Then U = Ua + (K— a) 3 + (x - ax —a - h) 2 + 
+(x-a)(x-a-h)....(x-a-n — Th) == 


Proof. Let (x) be an interpolating polynomial of degree n which 
represents U,in a<x<a+nh. Then (x) may be written in the form 


NOTES 
(x) ZAo + Ay(x — a)  Ag(x — a)(x —a — h) + 
As(x — a)(x — a — h)(x a — 2h)e...... 
+A, (x — a) (x — a — h) ..(x-a-mn-— Th) wen... (1) 
By definition of interpolating polynomial we have U, = (a) 


Putting x=a, in(1) we get Ag = Ua 


Putting x=a+th in (1) we get A, = : [Uasn — Aol 


1 
Th [Ua+h — U4] 


_ AUa 


nA == 


Putting x =a+2h in (1) we get 


1 


Ag = zz [Ua+2h — Ao — 2hA] 
1 
— 2h? [Ua+2h — Ua — 2AU,] 
1 
= p2 [Ua+2h — 2Ua+h — Ual 
A Ua , 
= Fp (refer example in 2.1) 
Similarly substituting x=a+3h, ......, aen — 1 h we get 
= A*U4 B A"U, 
pape a An E 


Substituting these values in (1) we get 


Ua “Ua 
x)= U, + (x — a) 224 ea) (ea — B) o 
——,. APU, 
+(x-a)(x-a-h)....(x-a-n — 1h) nth? 


Since (x) is the interpolating polynomial which represents U, the 
function $(x) can be written as Uy 
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| E A? Ua 
“Uy = U, + (x-a) $ h) zme t 
—  4A’Ua 
+(x-a)(x-a-h)....(x-a-n — 1h) n'h” 


The above formula is known as Newton -Gregory formula for NOTES 
forward interpolation. 


Corollary 1. If we take = =r so that x =a+rh then the Newton —Gregory 
formula for forward interpolation reduces to 


$(a+rh)= Ua +AU, + DA ce gage 


Note. Since (x) is the interpolating polynomial which represents U, the 
function (a+rh) can be written as |. U4,,5. Hence Newton Gregory 
formula for forward interpolation becomes, 


Uatrn=Ua AUS + LATE eg SS 


Corollary 2. The formula given in cor 1. Can also be used to 
extrapolate in the interval (a-h, a) 


Newton’s Gregory formula for forward interpolation cannot be 
used for interpolating a value of U, near the end of the given data. To get 
a formula for this purpose we can write the interpolating polynomial (x) 
of degree n which represent U,in atnh<x<a as 


$(x)2Ag + Ay(K — a + nh) A;(x — a + nh)(x-a + (n — 1)h)+....+ 
Ay (x — a + nh)(x-a + (n — 1)h).....(x-a + h) 
As in the proof of theorem 10.3 we can find 


V"Uatnh 
An = —ug 1=0, 1,2, "E! 


Thus, U, = U4444 + — EE notte (x -—a+nh) + 7 Vasa (x — a nh)(x- 


a+(n— 1)h+..+YnUa+nhn'hnx— a+nh...x—a+h 


This is known as Newton’s formula for backward 
interpolation. 


Takin ng = Tr we get x=a+nh+rh 


Further using , 


h=(a+nh)=[a+(n-1)h], 2h=(a+nh)-[a+n-2)h] etc. 
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The above equation can be written as 


rVUa+nh 


iar (Se 4 r(r+1)V7Uasnh 4 r(r-1)(r-2)V?U4,. nh 
a+nh+r atn 
1! 


2! 3! 


r(r*1)..(r*n-1VPU4 nn 
n! 


Solved Problems. 

Problem 1. If U7; = 246; Ugg = 202; Ug; = 118 and Uo, = 40 
Find Uo. 

Solution. 


Here a=75; hz5 We have to find U4,,4, = U79 


'. atrh=79 Hence 75+5r=79 
^. r=4/5=0.8 
By Newton — Gregory formula for equal intervals, 


r rr-1) 5 
Uasrh = Ua + — AU, + ——— AU, + 
1! 2! 

We require AU,, A*^U,................ 


Hence we from the difference table as given below 


0.8(-44) , 08(0.8—1) 


U7 = 246 + — + PEO URRI 
1 1.2 


1.2.3 


(—40)+ (46) 


= 246-35.243.241.472 
= 215.472 
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X U, AU, AU, AU, 
75 246 

-44 
80 202 -40 

-84 46 
85 118 6 

-78 
90 40 


Problem 2. By using Gregory-Newton's formula find U, for the 
following data. Hence estimate (i) U, 5 


(ii) Ug 


Uo 


Ui 


U2 


1 


11 


21 


28 


29 


Solution. Let us from the difference table first. 
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X Ux AU, AU, NU, |A*U, 
0 1 

10 
1 11 0 

-3 

10 0 

2 21 3 
3 

r 
3 28 -6 

1 
4 29 


Here the third order differences are constant and hence the 
required function is a polynomial of degree 3. In this case we use the 


formula. 


2 


B AU, A 
U, = Ua +t&-a) im" Q&-a(x-a-h zmt 
Here a=0 and h=1 


(-3) 


3! 


“Ug = 14+ 6-0) x 4+ x(x = 1) x5 4x(x — 1- 2) 


= 1410, 879072 
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= — (2 + 20x — x? + 3x? — 2x) 


“Uy = = (x? + 3x? + 18x + 2) 


(i).Urs = = (-(1.5)3 + 3(1.5)? + 18(1.5) + 2) 
-2(-3.375 *F6754 2742) 


= 16.188 


(ii)Us = 2 (- (9)? + 3(9)? + 18(9) + 2) 


=5(-729 +243 + 162 +2) = —161 


Problem 3. Population was recorded as follows in village. 


Year 1941 1951 1961 1971 1981 


1991 


Population | 2500 2800 3200 3700 4350 


3225 


Estimate the population for the year 1945. 
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Solution. 


Year |Populatio | AU, | A?U, | AU, | A*U, | A?U, 
X nU, 
1941 2500 
300 
1951 2800 
100 
400 
1961 3200 0 
100 50 
500 
1971 3700 50 -25 
150 25 
650 
1981 4350 75 
225 
875 
1991 5225 


We have to find U4545 Here a=1941 and h=10 
“Uairn = U4945 
Hence 1941+10r=1945. Hence r=0.4 


Applying Newton- Gregory formula we get 
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U,g4s = 2500 + 0.4 x T + 0.4(0.4 — 1) x = + 0.4(0.4 — 1) (0.4 — 


2) x = + 0.4(0.4 — 1)(0.4 — 2) (0.4 — 3) x Ž + 0.4(0.4 — 1)(0.4 — 
2)(0.4 — 3)(0.4 — 4) x = 

= 2500+120-12-2.08+0.75 

= 2606.67 
Exercises. 


1.If U;; = 2459; Ugo = 2018; Ugs = 1180 and Us, = 402. Find 
Uz9 


2. Estimate the expectation of life at the age of 16 years from the 
following data. 


Age in Years 10 |15 20 |25 |30 35 


Expectation of life in 35.4 | 32.3 | 29.2 | 26 |32.2 |20.4 
Years 


3.If log, 5 =0.6990; log, 10 =1; logy) 15 =1.161; and 
logo 20 =1.3010 find log, 12 
4.Given that sin45° = 0.7071; sin50° = 0.7660; 
Sin 55° = 0.8192; sin60° = 0.8660. Find sin52° 


5.From the following table find U45 


x 0 6 12 18 


Ux 23.1234 23.7234 24.6834 26.1330 


10.4 LAGRANGE’ S FORMULA 


Newton-Gregory formula can be used for interpolation only when 
we know the values of U, at points in equal intervals. The following 
formula due to the French Mathematician Lagrange can be used when 
we know the values of U, at points which are not at equal intervals. This 


formula also enables us to determine the form of the function U,. 
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Theorem 10.4 (Lagrange’s theorem) Let Ua, Ua,,.Ua, be the values 
of U at a4,a2,...,a, (not necessarily at equal intervals) then an 
interpolating Polynomial $(x) for Ux is given by 


x) caa Gc. (ag) (x~a1)(K~ag) (KAN) 


~ (ay—az)(ay—a3)...(aj-an) #1 (ag—ay) (ag—ag)...(az—an) 


(x-a1)(x-a2)...(x-an-1) 


(an -a4)(an-22)...(an-an-1) an 


Proof. Since n values for U, are given we can assume (x) to be a 
polynomial of degree n-1 Let 


o(x)=A, (x — az) (K — a3) .....(x — ay) + A2 (x — a4)(x — a3) ... (x — an) 
+....4Ay (XK — a4)(x — ag) ..... (X — ap 4)............. (1) 


When X—84, we get Ua, = A4(a4 = az) (a, E a3) TP (a, = an) 


Ua, 


~ (à1-a2)(a1-a3).. (41 -àn) 


nA 


Similarly when x = a3, a3 ..., a, we get 


Ua, 
A, = ——— > 
(a? — a1) (az — a3) ... (a2 — an) 
U 
An = E 


(an zt a1) (an V a2) oe (an Ea an-1) 
Substituting these values in (1) we get Lagrange’s formula. 


Note. Since ¢(x) is the interpolating polynomial which represents U, the 
function $(x) can be replaced by Ux in (1) 


Hence Lagrange's formula becomes 


E (x—a5)(x—a3)....(x-an) (x—a1)(x-a3)....(x-an) 
X (a,—ag)(a,-as).(ai-an) ^ ?1 (a2—a4)(a2-a3)..(a2-am) 


(x-a1)(x-a2)....(x-an-1) 


(an -a4)(an-22)...(an-an-1) an 
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Solved Problems. Interpolation 
Problem 1. Find U; given that U; = 4; U, = 7; and U; = 13; U, = 30 
Solution. The arguments 1, 2, 4, 7, are not at equal intervals and we use 


Lagrange's formula to find Us Take a, = 1; az = 2; a3 = 4; a4 = 7 and 
x=5. Substituting in Lagrange’s formula we get 


NOTES 
Us = aasan lene nen ene ae 
[peaa 
deacra kae J 7 + [aco] e 
4 


Ie cq qe 08 
3 5 3 


Problem 2. Find the form of the function U, for the following data. 


Hence find U3 
x 0 1 2 5 
U, 2 3 12 147 


Solution. Here a, = 0; az = 1;a3 = 2;a, =5 
^ Ug, = 2; Ua, = 1; Ua, = 12; Ua, = 147 
Applying Lagrange's formula we get 


lcs ee x ee x14 


~~ 1(0-1)(0-2)(0-5) (1-0)(1-2)(1-5) 


(x—-0)(x-1)(x—5) (X-0)(X-1)(X-2) 
Lone AIRT en x 147 
3. 2 pe 3. 2 
__ Q8-8X EUN 10) " 3(X x +10X) _ 38 aN xy. 


x3-3x?42x 
— X 

= Š [x3(-12 + 45 — 120 + 147) + x?(96 — 315 + 720 — 

4414-x—2044-450—600--2944-120 

== [60x? + 60x? — 60x + 120] 

nU =X? +x? —x+2 

"U3 = 37 +3? -3+2 

=35 
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: I lati 
Problem 3. Determine by Lagrange's formula the percentage number of i i 


criminals inder 35 years 


Age % number of criminals 
Under 25 years 52.0 
NOTES 
Under 30 years 67.3 
Under 40 years 84.1 
Under 50 years 94.4 


Solution. We have to find Us; 
Here a, = 25; a; = 30; a, = 40; a, = 50 
n Ua, = 52; Ua, = 67.3; Ua, = 84.1; Ua, = 94.4 
Applying Lagrange's formula we get 


(35-30)(35-40)(35—50) [55-2565 095- 80 


We [esse AED (30-25) (30-4080-50), (973 + 


[pez s 50) 
(45—25)(40-30)(40—50) 


(35-25)(35-30)(35-40) 


id [Geom x 94.4 


(-1)x52.0 3x67.3 1x84.1 1x94.4 
== 4 + 


5 4 2 20 


= -10.40+50.38+42.05-4.72 


= 77.31 


Hence the estimated % number of criminals under 35 years is 77.31 


Problem 4. Prove that Lagrange's formula can be put in the form 


2 $(x)Ua, 


= oy) 2900 
U; = 2. aa) where $'(x)- ra and 


o(x)= ITE (x — ay) 2 (x — ay) (K — ag) (x — a4) 
Solution. The Lagrange's formula is given by 


_ (x—a5)(x—a3)....(x-an) (x—a1)(x-a3)....(x-an) 
X (a,—ag)(a,-as)..(ai-an) ^ ?1 (a2—a4)(a2-a3)..(a2-am) 


(x-a4)0x-a2)....(X-an-4) x (1) 


(ap -a4)(an—a2)...(àn—an-4) an ZZ ZZ ZZ LLL LEE 
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Now$(x)2 (x — a4)(x— a2) ....(x— ap) |  aessssrsrrrrnrssseerrnnns (2) ee 


log (x)= y log (x — aj) 


i=l 


Differentiating w.r.t. x we get 


ee NOTES 
Differentiating w.r.t. x we get no- 2 Gal 


' z X z (x—-a4)(x-a2)....(x-apn 
OO = 2 E È Bc 
= (x — ag)(x — a3) .....(x — an) + (x ^ a4) (x — ag) .... (x — an) 
+......+(K — a4)(x — a2) .....(x — ag i) 

“9 (a4) 2(a4 — az) (ay — ag)... (a4 — an) 

' (az) = (az — ay) (az — a3) ... (a2 — an) 

9 (an)=(an US a1) (an Eu a2) ate (an an-1) 
Substituting these values in (1) and using (2) we get 


_ 9G0Ua, $GQUa; o(x)Uan 
x ^ (x-a)$ (a) (x-az)6 (a2) "''  (œx-an)b (an) 


Exercises. 


1.The following table gives the normal weight of a baby during 
first 6 months of life. 


Age in month | 0 2 3 5 6 


Weightin lbs |5 7 8 10 12 
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Find the estimated weight of baby at the age of 4 months. 


2. Apply Lagrange's formula to find U, from the data Ug = 2; 
Ui = 5; U3 = 29; U; = —19 


3.The following table gives the premium payable at age in years 
completed. Interpolate the premium payable at age 35 
completed. 


Age 25 30 40 60 
completed 


Premium in 50 55 70 95 
Rs. 


4. Given logi, 654 = 2.8156; 
log1o 658 = 2.8182; 
log1o 659 = 2.8189; 
log1o 661 = 2.8202; 
Find logio 656 
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UNIT-11. THEORY OF ATTRIBUTES 


11.1 INTRODUCTION 


Statistics chiefly deals with collection of data, classification of 
data based on certain characteristics, calculation of statistical constants 
such as mean, median, mode, standards deviation etc., and analysis of 
data based on the statistical constants. The characteristics used for 
classification of data may be quantitative or qualitative or qualitative in 
nature. For example, when we consider the set of students in a class, 
their heights and weights are characteristics which are quantitative 
whereas their efficiency, intelligence, health and social status are 
characteristics which are qualitative. The qualitative characteristics of a 
population are called attributes and they cannot be measured by numeric 
quantities. Hence the statistical treatment required for attributes is 
different from that of quantitative characteristics. In this chapter we 
develop the statistical techniques used in the theory of attributes. 


11.2 ATTRIBUTES 


Suppose the population is divided into two classes according to 
the presence or absence of a single attribute. The positive class denotes 
the presence of the attributes and the negative class denotes the absence 
the attribute. Capital Roman letter such A,B,C......... are used to denote 
positive classes and the corresponding lower case Greek letters such as 
0,p,y,0.. are used to denote negative classes. For example if A 
represents the attribute richness , then a represents the attributes non- 
richness(poor). 


The combinations of attributes are denoted by grouping together 
the letters concerned. 


For example, if attributes A represents health and B represents 
wealth then AB represents the possession of both health and wealth; Ap 
represents health and non-wealth; aB represents non-health and non- 
wealth. A convenient way of representing two attributes ina 2x 2 tables 
is as follows. 


Attributes B p 
A AB A p 
a aB ap 
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A class represented by n attributes is called a class of n” order. 


For example, A,B,C, a,ß,y are all of first order, AB, A B, aB, a B 
are of second order, and ABC, A By, ABC, ay are of the third order. 


The number of individuals possessing the attributes in a class of 
n^ order is called a class frequency of order n and class frequency are 
denoted by bracketing the attributes. 


Thus(A) stands for the frequency of A the number of individuals 
possessing the attributes A and (A f)stands for the number of 
individuals possessing the attributes A and not B. 


Note 1. Class frequencies of the type (a), (AB), (ABC)..... are known as 
positive class frequencies. 


class frequencies of the type (o),(B).( a B).( ay).... are known 
as negative class frequencies. 


Class frequencies of the type (aB),(A 8).(A By),(aBC).... are 
known as contrary frequencies. 


Note 2. If N is the total number of observations in a population (i.e., N 
is the total frequency ) without any specification of attributes the N 
is considered to be a frequency of order zero. 


The frequency classes for two attributes can be represented 
in the form of a table as shown below. 


Attributes B p Total 
A (AB) (aB) (A) 
a (Af) (aß) (a) 
Total (B) (B) N 


N denotes the total number in the population. 


In the population of size N , the relation between the class 
frequencies of various orders are given below. 


N =(A) + (x) = (B) +( B) = (C) + (ete., 
(A) = (AB) + (A B) \ 
(B) = (AB) + («B) 
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Attributes 


(x) = (xB) + (xf) 


(B) = (AB) + (KB) Je (2) 
N=(A)  («) 
N=(B) +(8) \ > N= (AB)+( A B)+ («B)+ (ef) NOTES 


from (1) and (2). 


For three attributes A,B and C we get similar results as shown below. 


(A) = (ABC) +(ABy) + (A BC)+(AB y) 
(B) = (ABC) +(ABy) +( «BC) + («f y) 
(C) = (ABC) «(A BC) + ( «BC) +(«BC) 
(AB)= (ABC) + (ABy) 

(A B) = (ABC) + (AB y) 

(xB) = («BC) +( «By) 

(xB) = (BC) -F(«B y)etc., 


Note. Any class frequency can be expressed in terms of frequencies of 
higher order. 


The following table gives the class frequencies of all orders and 
the total number of all class frequencies upto 3 attributes. 
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Order | Attributes 


Class frequencies of 
all orders 


Number in 
each order 


N 


N 
(A),( X) 


A,B 


N 


(A).(B).( &),C B) 


(AB),(A 
P), (xB); (xp) 


A,B,C 


N 
(A),(B),(C),(&),(B),(Y) 


(AB),(A 
B) C«B), Cp), 


(AC),(A y), C «C).Coc 
y) 


(BC), (B y),C BC)C y) 


(ABC),(AB y),(ABC), 
(ABy),( «B y),( «BC) 
CxBy) 


12 


27 


The classes of height order are called the ultimate classes and 


their frequencies are called the ultimate class frequencies. 
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Attributes 
Theorem 11 .1 Given n attributes, 
(i) total number of class frequencies is 3" 
(ii)total number of positive class frequencies is 2” 
NOTES 


(iii) total number of negative class frequencies is 2"-1 


Proof.(i) The number of ways of choosing r attributes from the given set 
of n attributes is (°).Since each attribute gives two symbols (one for 
positive and one for negative), the number of class frequencies of order r 
that can be obtained from r attributes is 2". 


Hence the total number of class frequencies of order r is G2 


Thus the total number of frequencies (of all orders) 
= (92r 1424? a... 2 
r=0 
XE PA = 3: 


(ii) Any collection of r attributes gives rise to only one positive class frequency 
of order r (all possessing attributes only). 


Hence total number of positive class frequencies (of all orders) 
= 2, (") 2 1«() «(2) +... © 
r=( 


= (1+1)" 22? 


(111) There is no negative class frequency of order 0.Any collection of r 
attributes gives rise to one negative class frequency of order r (all non- 
possessing attributes). 


Hence total number of negative class frequencies(of all orders) 
1 "X. 4n 
= 2, ()=2" -1 


Dichotomisation is the process of dividing a collection of 
objects into tow classes according to the possession or non-possession of 
an attribute. 


Suppose a population consists of N objects. If A is an attribute we 
have N = (A) + («). 


=>N=AN+.N 


=(A + «).N 
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=>1=A+« 


Thus in symbolic expression A can be replaced by 1 - « and œ 
by 1 - A. This concept is useful to express any class frequency in 
terms of higher order class frequencies and ultimately in terms of 
ultimate class frequencies examples 1, and 2 below). Also it is useful 
to express positive class frequencies (negative class frequencies ) 
in terms of negative class frequencies (positive class 
frequencies) (refer examples 3,4, and 5 below). 


Examples 
1. (AB) = (ABC) + (ABy) 
Consider (ABy) = ABy.N = AB(1- C).N 
— AB. N- ABC. N 
—(AB) - (ABC) 
-. (AB) = (ABC) + (ABy) 
2. If there are two attributes A and B we have 
N = (A) +(«) =(B) + (B) 
Hence N = (A) («) = (AB) +(AB) + (xB) + («B) 
and N -(B)- (B) = (AB) + (xB) + (AB) + (xf) 
If there are three attributes A,B,C we have 
N = (A) + («) 
=> N = (AB) + (AB)+ (xB) + («). Thus 
N = (ABC) + (ABy) + (ABC) + (ABy) +(xBC) + 
(xBy) + («pC) + (ey). 
3.Consider two attributes A and B. 
Now (xf) 2«f . N =(1 -A)(1-B).N 
=(1-A-B+AB) 
N =N-A.N-B.N+AB.N 
= N-(A) - (B) + (AB) 


Here negative class frequency has been expressed in terms of 
positive class frequencies. 
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4. (AB) — AB.N 
(1 -x)(1-B).N = (1-«-B-«B).N 
=N-x.N -B.N + «G.N =N - (x) - (B) + (ep) 


Here positive class frequency has been expressed in terms of 
negative class frequencies. 


5.(xBy) = «By.N = (1 - AY1 - B(1 - C.N 
=N-AN-B.N-C.N+AB.N+ AC.N+BCN - ABC.N 
=N - (A) - (B) - (C) + (AB) + (AC) «(BC) - (ABC) 

Note. N = (A) + (B) + (©) - (AB) - (BC) -(AC) + (ABC) + («fy) 


Solved problems. 
Problem1. Given (A) = 30; (B) 225; («)—30; («B)=20 
find (i) N (ii) (B) Gii) (AB) (iv) (AB) (v) (xp) 
Solution. (i) N = (A) + («)=30 + 30—60 
(ii) (B) = N - (B)=60-25 = 35 
(iii) (AB) = AB.N = (1 - «)(1-8).N 
=N - («)-(B) +( xB) 
=60-30-35+20 = 15 
(iv) (AB)= AB.N = A(1-B).N =(A) - (AB) 
—30-15 — 15 
(v) («B)= xB . N = (1-A)B.N = (B)-(AB) 
=25-15=10 


Note. The result can also be got directly by completing the 2 x 2 
contingency table for the attributes A and B 


(B) |B) 
(A) l- - 30 
(x) | - 20 30 
25 
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Problem 2. Given the following ultimate class frequency of two 
attributes A and B. Find the frequencies of positive and negative class 
frequencies and the total number of observations. 


(AB) 2975 ; (xB) = 100;(AB)=25; («B) = 950 
Solution. Positive class frequencies are (A) and (B) 

(A) = (AB) +(A B) = 975 + 25 = 1000 

(B) = (AB) + (xB) = 975 + 100 21075 
Negative class frequencies are («) and (f) 

(e) = (xB) + («B)=100+950=1050 

(B) =(AB) + («B) = 25+950=975 

N = (A) +(«) = (B) + (B) 

Taking N = (A) +(«)=1000 + 1050 = 2050 


Note. The results can also be got directly by completing the 2 x 2 
contingency table for the attributes A and B. 


Problem 3. Given the following positive class frequencies. Find the 
remaining class frequencies N = 20; (A) = 9; (B)=12;(C) = 8; 
(AB)=6;(BC)=4;(CA) = 4;(ABC) = 3 

Solution. There are three attributes A,B,C 


The total number of class frequencies is 33 = 27 


We are given only 8 class frequencies and we have to find the 
remaining 19 class frequencies. They are 


Order1.  («) =N-(A)=20-9=11 
(B) =N - (B)= 20-12=8 
(y) =N - (C)= 20-8=12 
Order 2. (AB) = A(1-B).N —(A)-(AB) 29-6 =3 
(xB) = A(1 - B)B.N = (B) - (AB) = 12 -6 = 6 
(xB) = A(1 - B)B.N = (B) - (AB) = 12 - 6 = 6 
(Ay) = A(1-C).N = (A) - (AC) = 9-4 =5 
(xC)=(1-A)C =(C)-(AC) =8-4 =4 
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(By = B(1-C).N =(B)-(BC)=12-4 =8 
(BC) = (1 - B)C.N = (C) - (BC) =8- 4 =4 
(xB) = (1 - A)(1-B).N = N-(A)-(B)+(AB) 
=20-9-12+6 =5 
(By) = (1 - B)(1-C).N = N-(B)-(C)+(BC) 
= 20-12-8+4=4 
(xy)= (1-A)(A-C).N = N-(A)-(C)+(AC) 
=20-9-8+4=7 
Order 3. (ABy)= AB(1-C).N = (AB)-(ABC)= 6-3=3 
(ABC)= A(1-B)C.N = (AC)-(ABC)= 4-3=1 
(ABy)= A(1-B)(1-C).N = (A)-(AC)-(AB)+(ABC) 
=9-4-6+3=2 
(«BC)=(1-A)BC.N =(BC) - (ABC) 
= 4-3=1 
(xBy)= (1-A)(1-C) B.N=(B)-(BC)-(AB)+(ABC) 
=12-4-6+3 =5. 
(xBC)= (1-A)(1-B)C.N = (C)-(AC)-(BC)+(ABC) 
=8-4-4+3=3. 
(xBy)= (1-A)(1-B)(1-C).N = N-(A)-(B)- 
(C)+(AB)+(BC)+(CA)-(ABC) 
= 20-9-12-8+6+4+4-3 =2 
percentage of tube lights which pass the four tests 


4615 
= — X100 
5000 


= 92.3 96 
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Exercise 
1. Given the frequencies (A) 21150; («)=1120; (AB) = 1075; 


(«8)-2985. Find the remaining class frequencies and the total 
number of observations. 


2. Given the following ultimate class frequencies. Find the 
frequencies of the positive and negative classes and the total number 
of observations. 


(AB)=733; (AB)=840; («B)=699; («B) = 783 


3.Given the following data. Find the frequencies of (i) the remaining 
positive classes (ii) ultimate classes 


N = 1800; (A)=850; (B) = 780; (C) = 326 
(ABy) = 200 ; (ABC)=94; («BC)=72 (ABC) = 50 


4. Given The Following Ultimate Class frequencies. Find the 
frequencies of the positive classes. 


(ABC) =298; (ABC) = 450; («BC) = 408;( «BC) = 342 
(ABy) = 1476; (ABy) = 2292; («By) = 3524 ;(«By) = 43684 


11.3 CONSISTENCY OF DATA 


Consider a population with the attributes A and B . For the data 
observed in the same population (AB) cannot be greater than (A). Thus 
the figures(A) = 20 and (AB) = 25 are inconsistent. We observe that for 
the above figures,(AB) = (A) - (AB) = -5, which is negative. This 
motivated the following definition. 


Definition: A set of class frequencies is said to be consistent if none of 
them is negative. Otherwise the given set of class frequencies is said to 
be inconsistent. 


Since any class frequency can be expressed as the sum of the 
class frequencies , it follows that a set of independent class frequency a 
consistent if and only if no ultimate class frequency is negative. 


We have the following set of criteria for testing the consistency in 
the set of single attribute, two attributes and three attributes. 
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Attributes | Condition | Equivalent positive class | Number of Attributes 
of conditions conditions 
consistency 

A (A)20 (A)=0 2 
(«)20 (A)€N(since (x) = NOTES 

( 1-A).N2 0 

A,B (AB)>0  |(AB)zO 2 

(AB) 20  |(AB)S(A) 
(xB) 20 (AB) x(B) 
(xB) =0 (AB) =(A)+(B)-N 

A,B,C (ABC) 20 ‘| (i) (ABC) 20 
(ABy) 20 ‘| Gi) (ABC) <(AB) 

(ABC) 20 | (IiD(ABC) <(AC) 
(«BC) 20 | (iv)( «BC) <(BC) 
(A By) z0 | (v)(ABC) (AB) + 2° 
(AC) +(A) 
(«By) 20 
(vi)(ABC) =(AC) + 
(xBC) 20 |(BC).(C) 
(xBy) 20 | (vii)(ABC) <(AB) + 
(BC) + 
(AC) - (A) - (B)-(C) + 
(N) 


Note 1. In the case of 3 attributes conditions. 


(i) and (viii) => (AB) + (BC) + (AC) =(A) + (B) + (© -N 


ere (ix)similarly, 
(ii) and (vii) => (AC) + (BC) -(AB)<(C)  . ........ (X) 
(ii) and (vi) => (AB) + (BC) -(AC) < (B).... (xi) 
(iv) and (v) => (AB) +(AC)-(BC) €(A) | ........ (X) 
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Conditions (ix) to (xii) can be used to check the consistency of 
the when the class frequency of first and second order alone are know. 


Note 2. If the given data are incomplete so that it is not possible to 
determined all the class frequencies then the conditions of consistency 
can be used determine the limits in which an unknown class frequency 
can lie. 


Solved problem 
Problem 1. Find whether the following data are consistent 
N = 600; (A) = 300; (B)2400; (AB) = 50 
Solution.We calculate the ultimate class frequencies («f),( xB) and AB) 
(xB) = «Q.N = (1 - A)(1-B).N = N- (A) - (B) + (AB) 
=600 - 300 -400 + 50 = -50 
Since («f) < 0, the data are inconsistent 


Problem 2. Show that there is some error in the following data : 5096 
people are wealthy ,35 96 are wealthy but not healthy,20 96 people 
healthy but not wealthy. 


Solution. Taking 'wealth' as A and 'health' as B we get the following 
data N = 100; (AB) = 50; (AB) = 35; (xB) = 20 


To check the consistency of data we find (c) 
(xB) = «Q.N = (1 - A)(1-B).N 
=N - (A) - (B) + (AB) 
But (A) = (AB) + (AB) = 50 + 35 = 85 
(B) = (AB) +(«B) = 50 + 20 = 70 
^ («Q) = 100 - 85 - 70 + 50 = -5 
<. (xß)< 0 
Hence there is error in the data. 


Problem 3. Of 2000 people consulted 1854 speak Tamil; 1507 speak 
Hindi;572 speak English;676 speak Tamil and Hindi; 286 speak 
Tamil and English;270 speak Hindi and English;114 speak Tamil, 
Hindi and English. Show that the information as it stands is incorrect. 
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Attributes 
Solution. 


Let A Cdenote the attributes of speaking Tamil, Hindi, 
English respectively. 


<. Give = 2000; (A) = 1854;(B) = 1507; (C) = 572; NOTES 
(AB) = 676; (AC) = 286; (BC)= 270; (ABC) = 114 
Consider (xBy) = xBy.N 
=(1 - AX1 - BY(1 - C).N 
= N - (A) - (B) - (C) - (AB) + (BC) + (AC) - (ABC) 
= 2000 - 1854 - 1507 - 572 + 676 + 270 + 286 -114 
=- 815 
^. (ey) < 0. Hence the data are inconsistent 


-. The information is incorrect 


Problem 4. Find the limits of (BC) for the following available data; 


N = 125; (A) = 48; (B) = 62; (C)=45; (AB) 7 and (Ay) = 18 
Solution. First of all we find (AB) and (AC) 
(AB) = (A) - (AB) = 48-7 = 41 
(AC) = (A) - (Ay) = 48 - 18 = 30 
Now, by condition of consistency (ix) 
(AB) + (BC) + (AC) = (A) +(B) +(C) - N 
=> 41+ (BC) + 30 > 48 + 62 + 45 - 125 
ABOV AAE aaO () 
Also using (xii) , (AB) + (AC) - (BC) < (A). 
=> (BC) > (AB) + (AC) - (A) = 41 + 30 - 48 = 23 
CBC ROSA da ueste neat (ii) 
Using (xi),(AB) +(BC) - (AC) < (B) 
=> (BC) < (B) +(AC) - (AB) = 62 + 30-41 =51 
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DBO) Slee tiene bue: (iii) 
Using (X),(AC) + (BC) - (AB) < (C) 


=> (BC) <(C) + (AB) - (AC) = 45 + 41-30 = 56 
DG OG este eae tts dti (iv) 
From (i),(ii),(iii) and (iv) we find 23 <(BC) x 56 
Problem 5. Find the greatest and least values of (ABC) (A) = 50; 
(B) = 60; 
(C) = 80;(AB) = 35; (AC) = 45 and (BC) = 42 
solution: 


The problem involves 3 attributes and we are given positive 
class frequencies of first order and second order only. 


Using positive class conditions (ii),(iii),(iv) of consistency for 3 
attributes 


(ABC) < (AB) => (ABC) < 35 
(ABC) < (BC) => (ABC) < 42 => (ABC) =>35....(1) 
(ABC) < (AC) => (ABC) < 45 

Using (v),(vi) and (vii) 

(ABC) = (AB) + (AC) - (A)=> (ABC) = 35 + 45 - 50 = 30 

(ABC) = (AB) + (BC) - (B)=> (ABC) 235 + 42 - 60 = 17 


(ABC) > (AC) + (BC) - (C)=> (ABC) 245 + 42-80 2 7 


Thus (ABC) >30 
(ABC) 217 => (ABC) 2 30... ss (2) 


(ABC) 27 


From (1) and (2) we get 30 € (ABC) x 35 


<. The least value of (ABC) is 30 and the greatest value of 
(ABC) is 31 
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Exercise 
1. Examine the consistency of data when 
(i) (A) = 800, (B) = 700,(AB) = 660, N = 1000 
(ii) (A) = 600, (B) = 500,(AB) = 50, N = 1000 
(iii) N = 2100 (A) = 1000, (B) = 1300,(AB) = 1100 


(iv) N = 100 (A) = 45, (B) = 55,(C) = 50(AB) = 15, (BC) = 
25, (AC) = 20, 


(ABC)= 12 


(iv) N = 1800 (A) = 850, (B) = 780,(C) = 326(AB) =250, 
(BC) = 122, 


(AC) =144, (ABC)= 50 


2. If (A) = 50, (B) = 60, (C) = 50, (AB) = 5, (Ay) = 20 and N 
= 100, find the least and greatest values of (BC) 


3. Given that (A) = (B) = (C) == ~ =50; (AB) = 30,(AC) = 25, 
Find the limits which (BC) will lie. 
4. If N = 120, (A) = 60,(B) = 90, (C) = 30, (BC) = 15,(AC) = 15, 


find the limits between which (AB) must lie. 


11.4 INDEPENDENCE AND ASSOCIATION OF DATA 


Two attributes A and B are said to be independent if there is same 
proportion of A's amongst B's as a amongst f/'s. Or equivalently the 
proportion of B's amongst A's is the same as amongst the «'s. 


Thus A and B are independent if 


(AB) _ (AB) (AB) (xB) T 
y 7 poem (i) (or) ORE (ii) 


From (1) we get 


(AB) _ (AB) _ (AB)+ (AB) _ (A) 
(B) (B) (B)+(B) N 


(AB) = 9999... (1) and (AP) 5 PO asss 2) 


GB) qt OO) 


Again from (i) we get 1 - (B) B 
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Attributes 


. (D- (AB) _ (D- (AD) 


(B) (8) 
B) _ (e 
UG  ® 
. (XB) _ (xB) («B(xB) (9 NOTES 
" (9 («B N 
oC) — (3) 
dnd (XB) P coc tese DE (4) 


(1),(2),(3),(4) are all equivalent conditions for independence of the 
attributes A and B. 


Note. In terms of second order class frequencies we get the condition of 
independence as (AB) («p) = (AB)( xB) 


For if A and B are independent attributes we have 
(A)(B)] [&9 (A) (BB 
(AB) (ep) = [PE] [899] = [PO] [89] Ap B) 
Thus A and B are independent if (AB) (x) - (Ap)( «B) = 0 .......... (5) 
Association and coefficient of association 


If (AB) = e cU ) we say that A and B are associated. There are two 


ossibilities. ORB > G (D W we say that A and B are positivel 
p y p y 


(a) ” 


associated and if(AB) < ——— we say that A and B negatively associated. 


Notation. Let us denote 6 = (AB) - aoe 


an» - 


Thus 5 = (AB) - — [N(AB) - (AXB)] 


=~ [{(AB) + (AB) + (xB) + (%B)} (AB) - ((AB) + (AB)} + 
(«B)}] 


== [(AB)( XP) - (AB)( XB)... (6) 
Note 1. We Know That A and B are independent if 5 = 0 
-. A and B are independent if Q = Y =0 


Note 2. If A and B are perfectly associated then (AB) = (A) hence (Ap)- 
0 or (AB) = (B) hence (xB) = 0. In either Q = 1 = Y 
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Attributes 
note 3. If A and B are perfectly disassociated then either (AB) = 0 or 


(xB) = 0 and in this case Q = -1 = Y 
Thus we get -1 < Q <-1 


Note. Yule's coeant Q and the coefficient of colligation Y is related 


NOTES 
by the relation Q = "m — 
. (AB)(«B) | 1-vx 
Proof. Let x = ABB) . Hence Y = TW 
_ (i= vx)" yx) 
(iX) 


dy 1-X-2/X  2(04X) 

U cn (iX) 
1- vVX 

Uq44Y? EE 1+x TIFE 
(14JX)^ . 


_(AB)(«B) 
^ (AB)(«p)_ (AB) (a8) —(AB) (aB) 


(AB)(«B) 
41+ COE (AB) (ap) + (AB) (aB) 


-Q 


From the above relationship between Y and Q we infer the 
following 


Q=0=> Y=0; Q=-1 => Y=-1 and 
Q=1 => Yz1 and conversely. 
Solved Problems. 


Problem 1. Check whether the attributes A and B are independent given 
that (i) (A)230, (B)=60, (AB)=12, N=150 


(ii) (AB)=256, (aB) = 768, (AB)=48, (aB)=144 


Solution. Since the given class frequencies are of first order the 


condition for independent is (AB)=2 
Consider uM D - 12 — (AB) 


"n (AB)- TO? . Hence A and B are independent. 
(i1) (A)=(AB)+(AB)=256+48=304 
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(B)= (AB)+(aP)=256+768=1024 
(a)-(ap)-(ap)27684-144-912 Attributes 
(B)2(A)(0)23044912 21216 


(AB) _ 304x1024 
N X 1216 


Now, = 256 = (AB) 


(AB)= cU . Hence A and B are independent. NOTES 


Aliter. Applying the condition (5) for independence, 

(AB)(aB)- (AB)(aB)=256x 144-768 X48=36864-36864=0 

..A and B are independent. 

Note. By providing Q=10 also we can conclude A an B are independent. 


Problem 2. In a class test in which 135 candidates were examined for 
profficiency in Physics and Chemistry, it was discovered that 75 students 
failed in Physics, 90 failed in Chemistry and 50 failed in both. Find the 
magnititude of association and state if there is any association between 
failing in Physics and Chemistry. 


Solution. Denoting 'fail in Physics' as A and 'fail in Chemistry' as B we 
get 


(A)=75, (B)=90, (AB)=50, N=135 
The magnititude of association is measured by 


_(AB)(@B)-(AB)(@B) 
(AB)(a@B)+(AB)(@B) 


Q 
we now get the ultimate class frequencies. 
(a)=N-(A)=135-75=60 
(B)=N-(B)=135-90=45 
(aB)=(B)-(AB)=90-50=40 
(AB)=(A)-(AB)=75-50=25 


(ap)=a-(aB)=60-40=20 


| 50x20-25x40 - 
" 50x20425x40 | 


Q 


..A and B are independent. Hence failure in Physics and Chemistry 
are completely independent of each other. 
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Problem 3. Show whether A and B are independent or positively 
asociated or negatively associated in the following cases. 


(i) N=930, (A)=300, (B)=400, (AB)=230 

(ii) (AB)=327, (AB)=545, (a8)=741, (aß)=235 
(iii) (A)=470, (AB)=300, («)=530, (ap)-150 
(iv) (AB)=66, (AB)=88, (aB)=102, (aB)—136 
(AXB) _ 300x400 


= = 129.03 
N 930 


(A)(B) 
N 


Solution. (i) 


Now, 6—(AB)- = 230 — 129.03 = 100.97 


Hence 5>0 Hence A and B are positively associated. 


(ii) Q= (AB)(aB)—(AB)(aB)__327x235-545x741 
^ (AB)(aB)+(AB)(aB) 327x2354545x741 

— 76845-403845 _ -32700 _ 0 C903 
768454403845 480690 


-. Q«0. Hence A and B are negatively associated. 
(iii) N=(A)+(a)=470+530=1000 
(B)=(AB)+(aB)=300+150=450 


(A)(B) _ 470x450 


Now, = 2115 
N 1000 


-.8- (AB)- 29)-300-2115--1815 


..6<0. Hence A and B are negatively associated. 


(iv) Qa BGB) CAB) (a) ..66x136-88x102 _ 
y " (AB)(aB)-(AB)(aB)  66x136+88x102 


-. A and B are independent 


Problem 4. Calculate the coefficient of association between 
intelligence of father and son from the following data. 


Intelligent fathers with intelligent sons 200 
Intelligent fathers with dull sons 50 
Dull fathers with intelligent sons 110 


Dull fathers with dull sons 600 
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Comment on the result. 


Solution. Denoting the 'intelligence of fathers' by A and 'intelligence 
of sons' by B we have. 


(AB)=200; (AB)=50; (aB)=110; (ap)—600 


q= -AP (ab) __ 200x600-50x110 
" (AB)(a@B)+(AB)(a@B)  200x600+50x110 
= 0.91235 


Since Q is positive it means that intelligent fathers are likely to have 
intelligent sons. 


Problem 5. Investigate from the following data between inoculation 
against small pox and preventation from attack. 


Attacked Not attacked Total 
Inoculated 25 220 245 
Not 90 160 250 
inoculated 
Total 115 380 495 


Solution. Denoting A as 'inoculated' and B as 'attacked' we have 


(AB)=25; (AB)=220; (aB)=90; (aß)=160 


. (ABY(ag)-(AB)(aB) 25x160-220x90 
~ (AB)(aB)-(AB)(aB) 25x1604220x90 


Q 


400-19800  —15800 
= ——_—__ = = —0.6638 
400+19800 23800 


This shows that the attributes A and B have negative 
association. 


(ie., 'innoculation' and ' attack from small pox' are negatively 
associated. 


Thus innoculation against small pox can be taken as the 
preventive measure. 
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Exercises. Show whether A and B are independent , positively 
associated or negatively associated. 


1.Show whether A and B are indepentent, positiviely 
associated or negatively associated. 


(i) N=392, (A)=154, (B)=168, (AB)=66 

(ii) N=1000, (A)=470, (B)=620, (AB)=320 
(iii) (A)=28, (B)=38, (AB)=12, N=60 

(iv) (AB)=90; (AB)=65; (aB)=260; (ap)—110 
(v)(A)=245, (a)=285, (AB)=147, (ap)—190 


2. In an examination in Tamil and English 245 of the 
candidates passed in Tamil, 147 passed in both, 285 failed in Tamil 
and 190 failed in Tamil but passed in English. Howfar is the 
knowledge in the two subjects associated? 


3. Calculate Yule's coefficient of association between marriage 
and failure of students 


Passed Failed Total 
Married 90 65 155 
Unmarried | 260 110 370 
Total 350 175 525 
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UNIT XII- INDEX NUMBERS 


12.1 INDEX NUMBERS 


An Index number is a widely used statistical device for 
comparing the level of a certain phenomenon with the level of the same 
phenomenon at some standard period. For example , we may wish to 
compare the price of a food article at a particular period with the price of 
the same article at a previous period of time. This Compar4ision can be 
expressed as the percentage of ration of the prices in the two periods and 
this number serves as a single food-price index number. The comparison 
of prices of several food articles at two different periods is usually 
expressed as a suitable weighted average of the percentage changes in 
these prices. In the calculation of averages, various standard measures of 
central tendencies such as Arithmetic mean , Geometric Mean, Harmonic 
mean can be used. 


In the computation of an index number, if the base year used for 
comparison is kept constant throughtout, then it is called fixed base 
method. If on the other hand, for every year the previous year is used as 
a base for comparison, then the method is called chain base method. 


Index numbers can be broadly classified into two types. 
(i)Unweighted or simple index number. 
(ii) Weighted index number. 

Two standard methods of computation are 


(A) Aggregate method. 
(B) Average of price relatives method 


I-A Aggregate Method. 


In this method total of current year prices for various 
commodities is divided by the total of the base year. In symbols, if pg 
denotes the price of the base year and p, the price of the current year 


Poi = oe X 100, where }; p1 is total of the current year and 


> pO is the total of the base year. 


This is the simplest method inwhich aggregate of the prices for base 
year and current year alone are taken into consideration. 
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Example 1. From the following data construct the simple aggregative 
index number for 1992 


Commodities Price in 1991 Rs. Price in 1992 Rs. 
Rice 7 8 

Wheat 3.5 3.75 

Oil 40 45 

Gas 78 85 

Flour 4.5 5.25 


Solution. Construction of price index taking 1991 as base year. 


Commodities Price in 1991 Rs. Price in 1992 
Rs. 
Rice 7 8 
Wheat 3.5 3.75 
Oil 40 45 
Gas 78 85 
Flour 4.5 5.25 
Total 133.0 147.00 
-. Aggregate index number po1= un x 100 
- = x 100 
=110.5 


I-B Average of Price Relatives Method (Simple index numbers). 


Price relatives denoting the price of a commodity of a base year 


as po and the price of te current year as p, the ratio of the prices” 


— is 
Po 


called the Price relatives. 


Index number for the current year is po, = = x 100 
0 


In the average of price relatives method the average of price 
relatives for various item is calculated by using any one of the measures 
of central tendencies such as arithmetic mean, geometric mean, harmonic 
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mean , etc.. Arithmetic and Geometric means are the very common 
average used in this method. 


(The Arithmetic mean index number po; = 


oe e r pı 1/ n 
(ii)The Geometric mean index number p; = Ir (&) x 100 
0 


where[] denotes the product. 


X(logh2x100) 


Hence logpo1- » 


Example. For the Example 1, we find the index number of the price 
relatives taking 1991 as the base year using (i) Arithmetic mean (ii) 
Geometric mean 


Corned [eres Fags | F100 Jing x 100) 

Rice 7 8 114.3 2.0580 

Wheat 3.5 3.75 107.1 2.0298 

Oil 40 45 112.5 2.0512 

Gas 78 85 109.0 2.0374 

Flour 4.5 5.25 116.7 2.0671 

Total 559.6 10.2435 
(i)Using arithmetic mean the index numberpo, 27775 — 111.92 
(ii)Using geometric mean the index number 

|. 10.2435 


logpo, = 5 = 2.0487 


“.Po1 = antilog (2.0487) = 111.87 
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Solved Problems 
Index Numbers 
Problem 1. From the following data of the whole sale price of rice for 
the 5 years construct the index numbers taking (1) 1987 as the base (ii) 
1990 as the base: 


Years 1987 | 1988 1989 1990 1991 1992 


Price of | 5.00 | 6.00 6.50 7.00 7.50 8.00 NOTES 
rice per 
kg. 


Solution. (i) Construction of index numbers taking 1987 as base 


Years | Price of rice Index numbers (base 1987) 
per Kg. 

6 

1987 5.00 : x 100 = 120 
6.5 

1988 6.00 SE x 100 = 130 

i 6.5 
7 

1990 7.00 E x 100 2 150 

i 7.5 

1991 7.50 — x 100 = 150 
5 

1992 8.00 


8 
z% 100 = 160 


From the index number table we observe that from 1987 to 1988 
these is a increase of 20% in the price of rice per Kg for 1987 to 1989 
there is a increase of 30% in the price of rice per Kg. Etc., 


(ii)Construction of index numbers taking 1990 as base 
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Years Price of rice per Kg. Index number 
(Base 1990) 
5 
adi =x 100 = 71.4 
7 
6 
wee = X 100 = 85.7 
7 
6.5 
Pee — x 100 = 92.9 
7 
7 
ve =x 100 = 100 
7 
7.5 
S — x 10 = 107.1 
7 
1992 


8 
7 x 100 = 114.3 


Problem 2. Construct the whole sale price index number for 1991 and 


1992 from the data given below using 1990 as the base year. 


Commodity | Whole sale prices in Rupees per quintal 
1990 1991 1992 
Rice 700 750 825 
Wheat 540 5/5 600 
Ragi 300 325 310 
Cholam 250 280 295 
Flour 320 330 335 
Ravai 325 350 360 
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Solution. Taking 1990 as base year 


Index Numbers 


Commodity | 1990 pọ | 1991 | 1992 p, | Relatives | Relatives 
pa for 91 for 92 
Rice 700 750 825 750 "m 825 x d00 
700 700 
= 1074 = 1179 NOTES 
Wheat | 540 575 |600 375 600 
szg% 100 | g4g X 100 
21065 |=1111 
Ragi 300 325 | 310 329 310 
8 agg X100 | ggg X 100 
= 108.3 = 103.3 
Cholam | 250 280  |295 280 295 
259 X100 | 555% 100 
= 112 = 118 
Flour 320 330 335 
330 oo | 325, 100 
320 320 " 
= 103.1 = 101.6 
Ravai 325 350 360 350 360 
39g X100 | 355 100 
= 107.7 = 110.8 
Total 644.7 662.7 
Index 107.5 110.5 
number 
(using 
A.M.) 


Index numbers for 1991 as base year 1990 is 107.5 
Index number for 1992 as base year 1990 is 110.5 
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Exercises. 


1. From the following data construct the aggregate index 


number for 1991 taking 1990 as the base: 


Index Numbers 


Commodities Price in 1990 Rs. | Price in 199] 
Rs. 
X 50 70 NOTES 
B 40 60 
C 80 90 
D 110 120 
E 20 20 


2. For the data given below calculate the index numbers taking 
(1)1984 as base year (11)1991 as base year 


Year 1984 | 1985 | 1986 | 1987 | 1989 | 1990 | 1991 | 19 
92 

Price of 4 5 6 7 8 10 9 10 

Wheat 

per Kg. 


II Weighted Index Numbers 


All items in the calculation of unweighted index numbers(simple 
index numbers) are treated as of equal importance. But in actual practice 
we notice that some items command greater importance than others and 
as such need more weight in the calculation of index numbers. 


Standard methods of computing weighted index number are: 
II-A Weighted aggregative method. 
II-B Weighted average of price relatives method. 


II-A Weighted aggregative method. Though there are many formulae 
to calculate index number in this method we give below some standard 
formulae which are very often used. 


(a) Laspeyre's index number. According to Laspeyre's method the 
prices of the commodities in the base year as well as the current year are 
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known and they are weighted by the quantities used in the base year. 
Laspeyre's index number is defined to be 


_ ÈX pido 
Hle co de 
(b) Paasche's index number. According to Paasches' method current 
year quantities are taken as weights and hence Paasches' index number is 
defined 
_ àÀp1di 
Pio1 = Y pod: p, 4, (100 
(c) Marshall- Edgworth's index number. According to this method the 
weight is the sum of the quantities of the base period and current period. 
Hence ) Marshall- Edgworth's formulae is defined by 


M.. = È P1(G0+41) 
lot È po(qo+q1) 


È P1dot+> P191,/100 
~Podotd Pod 


(d)Bowley's Index number. The arthemetic mean of Laspeyre's AND 
Paasche's index number defined to be Bowley's index number. Hence 
Bowley's Index number is given by 


= Ljo1 * P191 
Bii 77 > — 


1 [Xp19o*2p141 
LÉI[EEMUEMM x 
deed 199 


(e) Fisher's index number. Prof. Lrving Fisher, though suggested many 
index numbers, gives an 'ideal index number' as 


È p19o*£Xpi1d1 
Lo4= |2 x 100 
fon E se] 


we notice Ij44 = VLio1 X Pioi. That is Fisher's index number is 
the geometric mean of Laspeyre's index number and Paasche's index 
number. 


(f)Kelley's index number. According to Kelley, Weight may be taken as 
the quantities of the period which is not necessarily the base year or 
current year. The average quantity of two or more years may be taken as 
the weight. Hence Kelley's index number can be defined as 


_ Xp1d 
Mod Y Pog 


years. We notice that Marshall-Edgeworth index number is the same as 
Kelley's index number if the average quantity of two years is considered. 


x 100 where q is the average quantity of two or more 
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Example. Calculate (i) Laspeyre's (ii) ) Paasche's (ii) Fisher's index 
numbers for the following data given below. Hence or otherwise find 


Edgeworth and Bowley's index numbers. 


Commodities | Base Year 1990 Current Year 1992 
Price Quantity Price Quantity 
A 2 10 3 12 
B 5 16 6.5 11 
C 3.5 18 4 16 
D 7 21 9 25 
E 3 11 3.5 20 
Solution. 
Commo 1990 1992 Poqo | Pod1| P1do | P1qı 
dities 


A 2 10) 3 12 | 20 | 24 30 36 
B 5 16/65 | 11 | 80 | 55 104 71.5 
C 3.5 18| 4 | 16 | 63 | 56 12 64 
D a 21| 9 | 25 | 147 | 175 | 189 225 
E 3 11 | 3.5 | 20 | 33 | 60 | 38.5 70 
Total 343 | 370 | 433.5 | 466.5 


(i) ) Laspeyre's index number = %Prdo y 100 
ÈX Poqo 
= 88° X 100 = 126.4 
343 
(ii) Paasche's index number = 2 Padi y 100. 
È Pod 
= 299? x 100 
370 
= 126.1 


(iii) Fisher's index number= | 
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GEO EE Index Numbers 
5X à 
= [RA x 100 

343x370 


=126.2 


Lio, *P 
(iv) Bowley's Index number = #291 


NOTES 


. 126.4 126.1 


— 126.25 


È pi1go*X P101, 100 


Ed th's ind ber = 
(v) Edgworth's index number TOS 


| 433.5466.5 
^ 3434370 


x100 


— 200 x100 


ER 
= 126.2 
II-B Weighted Average of Price Relative Method 


In this metod the index number is computed by taking the 
weighted Arithmetic mean of price relatives. Thus if P is the price 
relative and V is the value weights Poqo then the index number po, = 


XPV 
XV 
Example. 
Index number using weighted arithmetic mean of price relatives. 
Commodity | Price | Price | Quantity | V P PV 


in in in 1990 
1990 | 1992 | qo | Pogo | Pt 
Po pı Po 


Coconut oil | Rs. Rs. 15 lit 750 108 81000 
50 54 


Ground Rs. Rs. 25 lit 1125 | 106.7 | 120037.5 
nut oil 45 48 


Gingles oil | Rs.43 | Rs.45 30 lit 1290 | 104.7 | 135063 


Rice Rs. 7 | Rs.9 | 350kg |2450 | 128.6 | 315070 


Total 5615 - 651170.5 


<. Weighted index number = AA ee 116 
EV 5615 
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Ideal index number. An index number is said to be ideal index number 
if it is subjected to the following three tests and found okeyed. 


(i)The time reversal test. 
(ii) The factor reversal tests 
(iii) The commodity reversal tests. 


(i) The time reversal test. Let I(o1) denote the indx number of 
the current year yı relative to the base year yg , without 
considering percentage, and I(49) denotes the index number of 
the base year yg relative to the current year y4 without 
considering the percentage. If I(94) X I(49)7, then we say that 
the indx number satisfies the time reversal test. 

(ii) The factor reversal tests. In this test the prices and 
quantities are interchanged, without considering the 
percentage, satisfying the following relation Iq) X Kap) = 
X padi 
Y Poqo 
relative to the base year and I( 5; is the quantity index of the 
current year relative to the base year. 

(iii) ^ The commodity reversal tests. The index number should be 
independent of the order in which different commodities are 
considered. This test is satisfied by almost all index numbers. 


, where Ipa) is the price index of the current year 


Remark 1. Fisher’s index number is an ideal index number. 


We verify whether the Fisher’s index number satisfies the threes 
tests for ideal number. 


Fisher’s index number is 14) = 


Time reversal test. Interchanging base year and current year 


XMpodi x È Poqo 


Xpidi DPido 


-lao) = 


È p1qdo " È p1qdı " XMpodi " È Poqo 


x = 
Now, loy 9€ Tray Xpodo  Xpodi ^ Xpidi ^ Xpido 


= 


Factor reversal test. Denoting the fisher's index number I, for the 
prices p and quantity q as 


X pido " È p1qı 


I = 
(pq) Xpoqo LPod 
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Index Numbers 
Interchanging the prices and quantities in I(pq) we get 


È q1ıPo ., 2 91P1 


lap = 2 
(ap) Xqopo LdoP1 
Xp49o ,, Xpidiy /Xpoadi | Xpidi 
x = ee (22 x2), NOTES 
Now, Iq) X lap) on 2) bo ed 
— ÈP191 
È Poqo 


Obviously Fisher’s index number satisfies the commodity reversal test. 
Hence Fisher’s index number is an ideal index number. 


Note. Of all the index numbers defined earlier Fisher's index number is 
the only index number which is an ideal index number. 


Remark 2. Laspeyer’s index number does not satisfy the time reversal 
X P190 nq _ 2Mpodi 
È Poqo 


test. ] We have po1= 


È p1qo ,, Xpodi 
Now, x = x =1 
Pot ^ P10 = y yao ^ Epidı 


Hence Laspeyre’s index number does not satisfy the time reversal test 


Remark 3. Paasche’s index number does not satisfy the time reversal 
test. .(Verify) 


Remark 4. Laspeyre’s and Paasche’s index number also do not satisfy 
factor reversal test. .(Verify) 


Solved Problems. 


Problem 1. Construct , with the help of data given below , Fisher's 
index number and shown that it satisfies both the factor reversal test and 
time reversal test. 


Commodity A B C D 
Base year price in 2 

Rupees 

Base Year quantity in | 50 40 120 30 
Quintals 

Current year price in | 7 8 5 4 
Rupees 

Current year quantity | 60 50 110 35 
in Quintals 
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Solution. 


Commodity | Base year | Current | Poqo | Podi | P1do P1491 
year 
Po qo P4 qı 
A 5 50 7 60 | 250 | 300 | 350 | 420 
B 6 40 8 50 | 240 | 300 | 320 | 400 
C 4 120 | 5 110 | 480 | 440 | 600 | 550 
D 3 30 4 35 90 105 | 120 | 140 
Total 1060 | 1145 | 1390 | 1510 
Fisher's index number is Io4, = X pido y XPi: y 100 
(90 XMpodo = LPod 
— [1390 , 1510.) 100 
1060 ` 1145 
Time reversal test. 
— [|XP1do, Xpidi _ |1390 _ 1510 
Now a Epa Aa. 4045 
Xpodi,,XPpodo _ [1145 |, 1060 
lao Spada Epido 1510 1390 
1390 1510 | 1145 _ 1060 
Now, lg) X I(10)= 1060 ^ 1145 ^ 1510 ^ 1390 — 1. 
Factor reversal test. 
.. [Xpi1do, Xpidi  |1390.,, 1510 
low = NEpoqdo ^ Xpoqi 41060 ^ 1145 
Interchanging the factors, 
È podi | X pidi 1145 1510 
kap) = ^ = langen "48800 
È poqo È p190 1060 1390 
N I x] |. {1390 ., 1510 | 1145 _ 1510 
OW; (pq) (qp) ~ {1060 ^ 1145 ^ 1060 ^ 1390 


_ 1510 = È p1dı 
1060 X Poqo 
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: um Index Numbers 
Hence the factor reversal test is also satisfied. 


Exercises. 


1.Construct index number of prices from the following data by 
applying (i) Laspeyre's Method (ii) Paasche's method (ii)Bowley's 


method (iv) Fisher's index method (v)Marshall-Edge worth method. NOTES 
Commodities | Base year Current year 
Price Quantity | Price Quantity 
A 2 8 4 6 
B 5 10 6 5 
C 4 14 5 10 
D 2 19 2 13 


2. Calculate Fisher’s index number for 1992 for the following data. 


Year | Rice Wheat Flour 


Price | Quantities | Price | Quantities | Price | Quantities 


1988 | 9.3 100 6.4 11 5.1 5 


1992 | 4.5 90 3.7 10 2.7 3 


3. For the data given below find the different weighted index numbers 


Commodities | Base year Current Year 
Price Quantities Price Quantities 
A 6 50 10 56 
B 2 100 2 120 
C 4 60 6 60 
D 10 30 12 24 
E 8 40 12 26 
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12.2CONSUMER PRICE INDEX NUMBERS (Cost of living index 
numbers) 


Index Numbers 


It is the index number designed to measure changes in the living 
cost of various classes of people. No consumer price index will suti all 
classes of people because different classes of people differ widely from 
each other in the style of functioning, consumption habits, mode of 
expenditure etc., These index numbers are useful for wage negotiations 
and wage contracts. Dearness allowance is calculated based on the cost 
of living index numbers. 


NOTES 


Formula to find the cost of living index numbers. 
(Aggregate ecxpenditure method. 


Cost of living index number Ig;= n Pade 


x 100.(Laspeyre's method) 
(ii) Family budget method. 


Cost of living index loa 


Where P=" x 100 and V-value weight poqo. 


Also m where P= P= x 100 and W is the weight. 


Solved Problems. 


Problem 1. Find the cost of living index number for 1992 on the base of 
199] on the basis from the following data using(i)family budget method 
(ii)aggregate expenditure method. 


Commodity Price in Rs. Quantity in 
Quintals in 1991 
1991 1992 
Rice 7 7.5 6 
Wheat 6 6.75 3.5 
Flour 5 5 0.5 
Oil 30 32 3 
Sugar 8 8.5 1 
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Solution. By family budget method. 


Commodity | po | p1 | do | Poqo | P1y 100 PV 
y Po 
P 
Rice T s 6 42 107.1 4498.2 
Wheat 6 | 6.75 | 3.5 21 112.5 2362.5 
Flour 5 5 0.5 2.5 100 250 
Oil 30 | 32 3 90 106.7 9603 
Sugar 8 | 8.2 1 8 106.3 850.4 
Total 163.5 - 17564.1 
Cost of finding index = E 
178641 
163.5 
= 107.4 
(11)By aggregate expenditure method 
Commodity | po | Pa | Go | Poqo P1do 
Rice 7 |75 6 42 45 
Wheat 6 | 6.75 | 3.5 21 23.6 
Flour 5 5 0.5 2.5 2.5 
Oil 30 | 32 3 90 96 
Sugar 8 8.2 1 8 8.5 
Total 163.5 175.6 


Cost of living index ==" x 100 
È Poqo 


175.6 
= — x 100 
163.5 


= 107.4 
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Index Numbers 


Problem 2. An enquiry into the budgets of the middle class families in a 
city in India gave the following information. 


Food | Rent Clothing | Fuel Misc 
Weights | 3596 1596 20% 10% 20% NOTES 
Prices 1500 | 300 450 70 500 
199] 
Prices 1650 | 325 500 90 550 
1992 


What changes in cost of living index of 1992 as compared with that of 
1991 are seen? 


Solution. Base year is choosen as 1991(=100) 


Item | Prices Prices Index number W PW 
S 1991 1992(1) 1992 p 
Food 1500 1650 1650 35 | 3850 
Rent | 300 325 | 22 x 100 =108.3 | 15 | 1624.5 
300 
Clot 450 500 500 20 | 2222.0 
: ——x102 111.1 
hing 450 
90 
Fuel 70 90 omic DoE 10 | 1286 
70 
isc. 550 
Misc 500 550 2x 100 — 110 20 | 2200 


Total 100 11182.5 


XPW 


Cost of living index numbers = Sw 


_ 11182.5 
^ 100 


=111.8 


Hence the prices in 1992 compared with the prices in 1991 has risen to 
11.8% 
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Problem 3. Find the cost of living index for the following data in a 
middle class family. 


Index Numbers 


Items Price Weight 
1991 1992 
Food 700 850 40 NOTES 
Clothing 300 280 15 
Rent 200 225 7 
Fuel 70 82 5 
Medicine 100 135 9 
Education 500 550 12 
Entertainment 100 90 10 
Misc. 475 425 23 
Solution. 
Items Price Index number 1992 WwW PW 
P 
1991 1992 
850 
Food 700 850 SS Toe ae. 40 4856 
700 
i 280 
Clothing 300 280 280 400 = 935 15 1399.5 
300 
225 
Rent 200 225 Een Gel etn. 7 787.5 
200 
82 
Fuel 70 82 SE 100 E PEN 5 585.5 
70 
ici 135 
Medicine 100 135 199 100 = 135 9 1215 
100 
i 550 
Education 500 550 A E, 12 1320 
500 
i 90 
Entertainm 100 90 2O 100 = 90 10 900 
ent 100 
i 425 
Misc. 475 425 223 100 = 89.5 23 2058.5 
475 
Total 121 13122 
m . PW 
Cost of living index number- 2" = 13122 = 108.4 


ZW 121 
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Exercises. 


1990 from the data given below. 


1. Calculate the index number of prices for 1992 on the bases of 


Commodities | Weights Price per unit Price per unit 
1990 1992 

A 40 80 85 

B 25 60 55 

C 5 345 50 

D 20 35 40 

E 10 25 20 


2.The following are the group index numbers and the group 
weights of an average working class family's budget . Construct the cost 


of living index numbers by assigning the given weights. 


Group Index number | Weight 
Food 352 48 
Fuel & 220 10 
Electricity 

Clothing 30 8 

Rent 160 12 
Miscellaneous | 190 15 
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UNIT-X111. ANALYSIS OF TIME 
SERIES 


13.1 INTODUCTION 


Economists and business are interested in studying the behaviour 
of sales, profits, national income , foreign exchange, industrial 
production, dividends of companies, share prices in share markets etc 
over a period of time. Such a study helps them in planning for the future. 
Analysis of time series deals with the methods of analysing such factors 
which vary with respect to time and deriving information regarding the 
likely future behaviour of those factors. 


13.2 TIME SERIES 


Definition. Time series is a series of values of a variable over a 
period of time arranged chronologically. 


For example, the following table giving the price level of a 
commodity in different years forms a time series. 


Year | 1990 1991 1992 1993 1994 1995 


Price | 100 111 117 120 130 135 
Level 


Mathematically a time series is a functional relationship y = f(t) 
where y is the value of the variable (phenomenon or factor) under 
consideration at a time t. 


In general a time series is influenced by a large number of forces 
of different kinds . For examples, the time series of retail prices of rice is 
the result of combined influences of rain fall, availability of fertilisers, 
good yield, transport facilities, consumer’ s demand and so on. 


13.3 Uses of Time Series 


e The most important use of studying time series is that it helps us to 
predict the future behaviour of the variable based on past experience 


e [tis helpful for business planning as it helps in comparing the actual 
current performance with the expected one 


e From time series, we get to study the past behaviour of the 
phenomenon or the variable under consideration 
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e We can compare the changes in the values of different variables at 
different times or places, etc. 


13.4 COMPONENTS OF A TIME SERIES 


The various reasons or the forces which affect the values of an 
observation in a time series are the components of a time series. The four 
categories of the components of time series are 


e Trend 
e Seasonal Variations 
e Cyclic Variations 


e Random or Irregular movements 


Seasonal and Cyclic Variations are the periodic changes or short-term 
fluctuation 


Long-term 
Movement or 


Trend 
Seasonal 


Variations 


Components of Short- term 


Time Series Movements 


Cyclic Variations 
Random or 
Irregular 
Movements 


Trend 


The trend shows the general tendency of the data to increase or decrease 
during a long period of time. A trend is a smooth, general, long-term, 
average tendency. It is not always necessary that the increase or decrease is 


in the same direction throughout the given period of time. 


It is observable that the tendencies may increase, decrease or are stable in 
different sections of time. But the overall trend must be upward, downward 
or stable. The population, agricultural production, items manufactured, 
number of births and deaths, number of industry or any factory, number of 
schools or colleges are some of its example showing some kind of 


tendencies of movement. 
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Linear and Non-Linear Trend 


If we plot the time series values on a graph in accordance with time t. The 
pattern of the data clustering shows the type of trend. If the set of data 
cluster more or less round a straight line, then the trend is linear otherwise it 
is non-linear (Curvilinear). 


Periodic Fluctuations 


There are some components in a time series which tend to repeat 
themselves over a certain period of time. They act in a regular spasmodic 
manner. 


Seasonal Variations 


These are the rhythmic forces which operate in a regular and periodic 
manner over a span of less than a year. They have the same or almost the 
same pattern during a period of 12 months. This variation will be present in 
a time series if the data are recorded hourly, daily, weekly, quarterly, or 
monthly. 


These variations come into play either because of the natural forces or man- 
made conventions. The various seasons or climatic conditions play an 
important role in seasonal variations. Such as production of crops depends 
on seasons, the sale of umbrella and raincoats in the rainy season, and the 
sale of electric fans and A.C. shoots up in summer seasons. 


The effect of man-made conventions such as some festivals, customs, 
habits, fashions, and some occasions like marriage is easily noticeable. 
They recur themselves year after year. An upswing in a season should not 
be taken as an indicator of better business conditions. 


Cyclic Variations 


The variations in a time series which operate themselves over a span of 
more than one year are the cyclic variations. This oscillatory movement has 
a period of oscillation of more than a year. One complete period is a cycle. 
This cyclic movement is sometimes called the ‘Business Cycle’. 


It is a four-phase cycle comprising of the phases of prosperity, recession, 
depression, and recovery. The cyclic variation may be regular are not 
periodic. The upswings and the downswings in business depend upon the 
joint nature of the economic forces and the interaction between them. 


Random or Irregular Movements 


There is another factor which causes the variation in the variable under 
study. They are not regular variations and are purely random or irregular. 
These fluctuations are unforeseen, uncontrollable, unpredictable, and are 
erratic. These forces are earthquakes, wars, flood, famines, and any other 
disasters. 
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UNIT-XIV MEASUREMENTS OF 
TRENDS 


14-1 MEASUREMENT OF TRENDS: 


Secular trend is a long term movement in a time series. This 
component represents basic tendency of the series. The following 
methods are generally used to determine trend in any given time series. 
The following methods are generally used to determine trend in any 
given time series. 


e Graphic method or eye inspection method 
e Semi average method 
e Method of moving average 
e Method of least squares 
Graphic method or eye inspection method 


Graphic method is the simplest of all methods and easy to 
understand. The method is as follows. First plot the given time series data 
on a graph. Then a smooth free hand curve is drawn through the plotted 
points in such a way that it represents general tendency of the series. As 
the curve is drawn through eye inspection, this is also called as eye- 
inspection method. The graphic method removes the short term 
variations to show the basic tendency of the data. The trend line drawn 
through the graphic method can be extended further to predict or estimate 
values for the future time periods. As the method is subjective the 
prediction may not be reliable. 


Advantages 


It is very simplest method for study trend values and easy to draw 
trend. 


Sometimes the trend line drawn by the statistician experienced in 
computing trend may be considered better than a trend line fitted by 
the use of a mathematical formula. 


Although the free hand curves method is not recommended for 


beginners, it has considerable merits in the hands of experienced 
statisticians and widely used in applied situations. 
Disadvantages: 


This method is highly subjective and curve varies from person to 
person who draws it. 
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The work must be handled by skilled and experienced people. 
Since the method is subjective, the prediction may not be reliable. 


While drawing a trend line through this method a careful job has 
to be done. 


Method of Semi Averages: 


In this method the whole data is divided in two equal parts with 
respect to time. For example if we are given data from 1999 to 2016 i.e. 
over a period of 18 years the two equal parts will be first nine years i.e. 
from 1999 to 2007 and 2008 to 2016. In case of odd number of years like 
9. 13, 17 etc. two equal parts can be made simply by omitting the middle 
year. For example if the data are given for 19 years from 1998 to 2016 
the two equal parts would be from 1998 to 2006 and from 2008 to 2016, 
the middle year 2007 will be omitted. After the data have been divided 
into two parts, an average (arithmetic mean) of each part is obtained. We 
thus get two points. Each point is plotted against the mid year of the each 
part. Then these two points are joined by a straight line which gives us 
the trend line. The line can be extended downwards or upwards to get 
intermediate values or to predict future values. 


Advantages: 


This method is simple to understand as compare to moving 
average methodand method of least squares. 


This is an objective method of measuring trend as everyone who 
applies this method is bound to get the same result. 


Disadvantages: 


The method assumes straight line relationship between the plotted 
points regardless of the fact whether that relationship exists or not. 


The main drawback of this method is if we add some more data to 
the original data then whole calculation is to be done again for the new 
data to get the trend values and the trend line also changes. 


As the Arithmetic Mean of each half is calculated, an extreme 
value in any half will greatly affect the points and hence trend calculated 
through these points may not be precise enough for forecasting the 
future. 


Method of Moving Average: 


It is a method for computing trend values in a time series which 
eliminates the short term and random fluctuations from the time series by 
means of moving average. Moving average of a period m is a series of 
successive arithmetic means of m terms at a time starting with Ist, 2nd , 
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s of Trend: 
3rd and so on. The first average is the mean of first m terms; the second Measurements OF Trends 


average is the mean of 2nd term to (m+1)th term and 3rd average is the 
mean of 3rd term to (m+2)th term and so on. 


If m is odd then the moving average is placed against the mid 
value of the time interval it covers. But if m is even then the moving 
average lies between the two middle periods which does not correspond NOTES 
to any time period. So further steps has to be taken to place the moving 
average to a particular period of time. For that we take 2-yearly moving 
average of the moving averages which correspond to a particular time 
period. The resultant moving averages are the trend values. 


Advantages: 
This method is simple to understand and easy to execute. 


It has the flexibility in application in the sense that if we add 
data for a few more time periods to the original data, the previous 
calculations are not affected and we get a few more trend values. 


It gives a correct picture of the long term trend if the trend is 
linear. 


If the period of moving average coincides with the period of 
oscillation (cycle), the periodic fluctuations are eliminated. 


The moving average has the advantage that it follows the general 
movements of the data and that its shape is determined by the data rather 
than the statistician"s choice of mathematical function. 


Disadvantages: 


For a moving average of 2m- 1, one does not get trend values for 
first m and last m periods. 


As the trend path does not correspond to any mathematical; 
function, it cannot be used for forecasting or predicting values for future 
periods. 


If the trend is not linear, the trend values calculated through 
moving averages may not show the true tendency of data. 


The choice of the period is sometimes left to the human judgment 
and hence may carry the effect of human bias. 


Method of Least Squares: 


This method is most widely used in practice. It is mathematical 
method and with its help a trend line 1s fitted to the data in such a manner 
that the following two conditions are satisfied. 
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X(Y - Y.) =0 i.e. the sum of the deviations of the actual values 
of Y and the computed values of Y is zero. 


XY - Yə is least, i.e. the sum of the squares of the deviations 
of the actual values and the computed values is least. 


The line obtained by this method is called as the “line of best fit”. 
This method of least squares may be used either to fit a straight line trend 
or a parabolic trend. 


Measurement of seasonal variations: 


There is a simple method for measuring the seasonal variation 
which involves simple averages. 


Simple average method. 
Step1. All the data are arranged by years and months( or quarters) 
Step 2. Compute the simple averages ( arithmetic mean) x, for i" month. 


Step 3. Obtain the overall average x of these averages x; and 


Xie X2 
12 


X= 


Step 4. Seasonal indices for different months are calculated by 
expressing monthly average as the percentage of the overall average x 


Thus seasonal index for i" month = a x 100. 


Solved problems. 


problem 1. use the method of least and fit a straight line trend to the 
following data given from 82 to 92 . Hence estimate the trends value for 
1993. 


Year 82 | 83 | 84 | 85 | 86 | 87 | 88 | 89 | 90 | 91 | 92 


Production | 45 | 46 | 44 | 47 | 42 | 41 | 39 | 42 | 45 | 40 | 48 
in quintals 


Solution. Let the line of best fit be y = ax + b 
Take X = x - 1987 and Y = y - 42 

Then the line of best fit become Y = aX + b 

The normal equations are XXY-aYXX? +b} X 


XY-aYXX-cnbwhenn-1l 
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-. From the table - 19 = 110 a Hence a = -19/110 = -0.17 Measurements of Trends 
17 211b Hence b = 17/11 = 1.55 

-. The line of best fit is Y = -0.17X + 1.55 
y = -0.17 x + 1987 x 0.17 + 1.55 +42 


^y = -0.17 x + 381.34 is the straight line trend. 


X | X=x-1987] Y | Y=y-42 | XY | X 
1982 E 45 3 -15 | 25 
83 E 46 4 -16 | 16 
84 3 44 2 6 | 9 
85 E) 47 5 0| 4 
86 a 42 0 0 | 1 
87 0 4l E 0 | 0 
88 1 39 -3 3| 1 
89 2 42 0 0| 4 
90 3 45 3 9 | 9 
91 4 40 E, -8 | 16 
1992 5 48 6 30 | 25 
0 » 17 -19 | 110 


From the line trend, when x 2 1982, y 2 44.4 


x = 1983, y = 44.23 x =1984 ,y=44.06 
x = 1985 , y =43.89 x = 1986 , y 243.72 
x = 1987 , y 243.55 X= 1988 , y =43.38 
x= 1989, y 243.21 x = 1990, y 243.04 
x 21991 , y 242.87 x= 1992, y 242.7 


Thus the trend values are 44.4, 44.23, 44.06, 43.89, 43.72, 43.55, 
43.38, 43.21, 43.04, 42.87, 42.7 


164 self - Instructional Material 


Problem 2. Calculate the seasonal variation indices from the following 


data. 
Month Monthly sales in lakhs Seasonal 
Of Rs. Total X, NE 
I H | m | Iv = x 100 
1991 | 1992 | 1993 | 1994 
January 10 11 | 11.5 | 13.5 | 46 11.5 TAX 100 = 95.8 
Februar | 8.5 8.5 9 10 36 9 » x 100 — 75 
y 
March 10.5 12 11 12.5 | 46 11.5 — x 100=95.8 
April 12 14 16 18 60 15 = x 100 =125 
May 10 9 12 15 46 11.5 — x 100 —95.8 
June 10.5 |10.5 | 11 14 46 11.5 E x 100 =95.8 
July 12 14 13 17 56 14 5 x 100 2116.7 
August 9 8 11 16 44 11 = x 100 =91.7 
Septem 11 11 | 12.5 | 13.5 | 48 12 =£ x 100 =100 
ber 
October 10 9.5 | 11.5 | 13 44 11 E x 100 291.7 
Novemb 11 12.5 | 10.5 | 14 48 12 = x 100 =100 
er 
Decemb 12 13 15 16 56 14 = x 100=116.7 
er 
Total 144 
Average 12 


Problem 3. Compute the trend values by the method of 4 yearly moving 
average for the data given in problem 1. 
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I II MI IV V VI 
Year Production | 4 yearly 4 yearly | 2 period Trend 
in quintals moving moving moving Values 
total average total (yn 
1982 45 - - - - 
83 46 - - - - 
182 45.50 
84 44 90.25 - 
179 44.75 
85 47 45.13 
174 43.50 
86 42 88.25 44.13 
169 42.25 
87 41 85.75 42.88 
164 41.00 
88 39 83.25 41.63 
167 41.75 
89 42 82.75 41.38 
166 41.50 
90 45 83.25 41.63 
175 43.75 
91 40 85.85 42.93 
1992 48 - - - - 
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Measurements of Trends 


Problem 4. Determine the suitable period of moving average for the data 
given in problem 


NOTES 


Production 


82 83 84 85 S86 87 88 89 90 91 92 93 


Years 


We observe that the data has peaks at the following years 1983, 1985, 
1990 and 1992 (refer the figure) 


Thus the data shows 3 cycles with varying periods 2, 5, 2 
respectively. Hence the suitable period of moving average is taken to be 


the A.M. of these periods. 


24542 : $ : 
Hence um 3 is the period of the moving average. 
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Problem 5. Calculate (1) three yearly moving average (ii) short term Measurements of Trends 
flctuations for the data given in problem 1. 


I II III IV 3 Short term 
Year | Production | 3 yearly Hera adr: NOTES 
is quintals | moving averages (II - IV) 
total 

1982 45 - - - 

83 46 135 45 1 

84 44 137 45.7 -1.7 

85 47 133 44.3 2:7 

86 42 130 43.3 -1.3 

87 41 122 40.7 0.3 

88 39 122 40.7 -1.7 

89 42 126 42 0 

90 45 127 42.3 2.7 

91 40 133 44.3 -4.3 
1992 48 - - - 


Trend values for the given time series are given in column IV. 


Short term fluctuations are given in the last column. 


Problem 6. Compute the seasonal indices for the following data by 
simple average method. 
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Season 1990 1991 1992 1993 | 1994 
Summer 68 70 68 65 60 
Monsoon 60 58 63 56 55 
E 
S 
2 
E Autumn 61 56 68 56 55 
E 
E 
ia Winter 63 60 67 55 58 
E 
[9 
-» 
Solution. 
Year Summer | Monsoon | Autumn | Winter Total 
1990 68 60 61 63 
1991 70 58 56 60 
1992 68 63 68 67 
1993 65 56 56 55 
1994 60 55 55 58 
Total 331 292 296 303 
Average 66.2 58.4 59.2 60.6 244.4 
Seasonal | $9? 584 TOO], = one x= 
"too «| “100 "o | 61 
Index —95.6 
=108.3 =96.9 =99.2 
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Exercises. 


1. Fit a straight line trend by the method of least squares to the 
following data. Assuming that the same rate of change continues what 


would be the predicted earnings for the year 1977 ? 


Year 1970 | 1971 | 1972 | 1973 | 1974 | 1975 | 1976 
Earnings | 1.5 1.8 2.0 2.3 2.4 2.6 3.0 
in 

thousands 


2. (1) Using three years moving average determine the trend. (ii) Also 
determine the short term fluctuations. 


Year 1986 1987 1988 |]1989 1990 1991 1992 | 1993 | 1994 | 1995 
Production | 21 | 22 | 23 | 25 | 24 | 22 | 25 26 27 26 
In lakhs of 
[Tonnes 
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DISTANCE EDUCATION - CBCS-(2018-19 Academic Year Onwards). 


Question Paper Pattern(ESE)- Theory 
(UG/PG/P.G Diploma Programmes) 


Time : 3 hours Maximum :75 Marks 
Part — A (10 X 2= 20 Marks) 
Answer all questions 


L2o0 090 cH ON: US pur 


0. 
Part —B (5 X 5z 25 Marks) 
Answer all questions choosing either (a) or (b) 


11.a. 
(or) 
b. 
12.a. 
(or) 
b. 
13.a. 
(or) 
b. 
14.a. 
(or) 
b. 
15.a. 
(or) 
b. 


Part -C (3 X 10 = 30 Marks) 
(Answer any 3 out of 5 questions) 


16. 
17. 
18. 
19. 
20. 
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